Undefined value
In computing (particularly, in programming), undefined value is a condition where an expression does not have a correct value, although it is syntactically correct. An undefined value must not be confused with empty string, Boolean "false" or other "empty" (but defined) values. Depending on circumstances, evaluation to an undefined value may lead to exception or undefined behaviour, but in some programming languages undefined values can occur during a normal, predictable course of program execution.
Dynamically typed languages usually treat undefined values explicitly when possible. For instance, Perl has undef
operator[1] which can "assign" such value to a variable. In other type systems an undefined value can mean an unknown, unpredictable value, or merely a program failure on attempt of its evaluation. Nullable types offer an intermediate approach; see below.
Handling
The value of a partial function is undefined when its argument is out of its domain of definition. This include numerous arithmetical cases such as division by zero, square root or logarithm of a negative number etc. Another common example is accessing an array with an index which is out of bounds, as is the value in an associative array for a key which it does not contain. There are various ways that these situations are handled in practice:
Reserved value
In applications where undefined values must be handled gracefully, it is common to reserve a special null value which is distinguishable from normal values. This resolves the difficulty by creating a defined value to represent the formerly undefined case. There are many examples of this:
- The C standard I/O library reserves the special value
EOF
to indicate that no more input is available. Thegetchar()
function returns the next available input character, orEOF
if there is no more available. (The ASCII character code defines a null character for this purpose, but the standard I/O library wishes to be able to send and receive null characters, so it defines a separateEOF
value.) - The IEEE 754 floating-point arithmetic standard defines a special "not a number" value which is returned when an arithmetic operation has no defined value. Examples are division by zero, or the square root or logarithm of a negative number.
- Structured Query Language has a special
NULL
value to indicate missing data. - The Perl language lets the definedness of an expression be checked via the
defined()
predicate.[2] - Many programming languages support the concept of a null pointer distinct from any valid pointer, and often used as an error return.
- Some languages allow most types to be nullable, for example C#.[3]
- Most Unix system calls return the special value −1 to indicate failure.
While dynamically typed languages often ensure that uninitialized variables default to a null value, statically typed values often do not, and distinguish null values (which are well-defined) from uninitialized values (which are not).[3]
Exception handling
Some programming languages have a concept of exception handling for dealing with failure to return a value. The function returns in a defined way, but it does not return a value, so there is no need to invent a special value to return.
A variation on this is signal handling, which is done at the operating system level and not integrated into a programming language. Signal handlers can attempt some forms of recovery, such as terminating part of a computation, but without as much flexibility as fully integrated exception handling.
Non-returning functions
A function which never returns has an undefined value because the value can never be observed. Such functions are formally assigned the bottom type, which has no values. Examples fall into two categories:
- Functions which loop forever. This may arise deliberately, or as a result of a search for something which will never be found. (For example, in the case of failed μ operator in a partial recursive function.)
- Functions which terminate the computation, such as the
exit
system call. From within the program, this is indistinguishable from the preceding case, but it makes a difference to the invoker of the program.
Undefined behaviour
All of the preceding methods of handling undefined values require that the undefinedness be detected. That is, the called function determines that it cannot return a normal result and takes some action to notify the caller. At the other end of the spectrum, undefined behaviour places the onus on the caller to avoid calling a function with arguments outside of its domain. There is no limit on what might happen. At best, an easily detectable crash; at worst, a subtle error in a seemingly unrelated computation.
(The formal definition of "undefined behaviour" includes even more extreme possibilities, including things like "halt and catch fire" and "make demons fly out of your nose".[4])
The classic example is a dangling pointer reference. It is very fast to dereference a valid pointer, but can be very complex to determine if a pointer is valid. Therefore, computer hardware and low-level languages such as C do not attempt to validate pointers before dereferencing them, instead passing responsibility to the programmer. This offers speed at the expense of safety.
Undefined value sensu stricto
The strict definition of an undefined value is a superficially valid (non-null) output which is meaningless but does not trigger undefined behaviour. For example, passing a negative number to the fast inverse square root function will produce a number. Not a very useful number, but the computation will complete and return something.
Undefined values occur particularly often in hardware. If a wire is not carrying useful information, it still exists and has some voltage level. The voltage should not be abnormal (e.g. not a damaging overvoltage), but the particular logic level is unimportant.
The same situation occurs in software when a data buffer is provided but not completely filled. For example, the C library strftime
function converts a timestamp to human-readable form in a supplied output buffer. If the output buffer is not large enough to hold the result, an error is returned and the buffer's contents are undefined.
In the other direction, the open
system call in POSIX takes three arguments: a file name, some flags, and a file mode. The file mode is only used if the flags include O_CREAT
. It is common to use a two-argument form of open
, which provides an undefined value for the file mode, when O_CREAT
is omitted.
Sometimes it is useful to work with such undefined values in a limited way. The overall computation can still be well-defined if the undefined value is later ignored.
As an example of this, the C language permits converting a pointer to an integer, although the numerical value of that integer is undefined. It may still be useful for debugging, for comparing two pointers for equality, or for creating an XOR linked list.
Safely handling undefined values is important in optimistic concurrency control systems, which detect race conditions after the fact. For example, reading a shared variable protected by seqlock will produce an undefined value before determining that a race condition happened. It will then discard the undefined data and retry the operation. This produces a defined result as long as the operations performed on the undefined values do not produce full-fledged undefined behaviour.
Other examples of undefined values being useful are random number generators and hash functions. The specific values returned are undefined, but they have well-defined properties and may be used without error.
Notation
In computability theory, undefinedness of an expression is denoted as expr↑, and definedness as expr↓.
See also
- Defined and undefined (mathematics)
- Null (SQL)
References
- ^ "undef". Perl 5 documentation. 2009-09-25. Retrieved 2010-03-26.
- ^ "defined". Perl 5 documentation. 2009-09-25. Retrieved 2010-03-26.
- ^ a b Carr, Richard (2006-10-01). "C# Nullable Numeric Data Types". C# Fundamentals tutorial. Retrieved 2010-03-27.
- ^ "Nasal demons". Jargon File. Retrieved 2014-06-12.