Spurred by the discussion of `nothing` in https://twitter.com/janssens_bart/status/1285506277448327171, I thought it might be fun to let others parttake in the reification of metaphysics that is programming language design and do a (not so) brief survey of the different notions of nothingness in #julialang
Before we get started, this is all in the docs (and even the faq - https://github.com/JuliaLang/julia/blob/3bef6ebfb52b779b4957ec9e84810b38c78a40de/doc/src/manual/faq.md#nothingness-and-missing-values), but I thought you might like a brief tour regardless
First up there's the empty immutable (singleton) structs. These have neither content nor identity. These means that there is only one value of these types. Or equivalently that there is no program that can distinguish two values of these types (we say the values are "egal")
Incidentally, identity/equality are whole other can of worms that I'll try to mostly sidestep here. Maybe another time. Suffice it to say equality is hard.
Some of these singletons have additional nothing-y meaning by convention (i.e. how functions in the standard library operate on them):`nothing`, `missing`, `()` (the tuple with no fields), `(;)`, (the tuple with no fields, but if it had fields they would have names).
The `nothing` singleton/value (ontological nothingness), represents that there was no value. Using it in most computational contexts is an error. `nothing` in particular is special cased in the system in a few ways:
First, it is the value returned, if you type `return` with no value. Similarly it is the value of expressions that have no value to return such as empty blocks (`x = if false; end` will set `x`) to the value `nothing`.
Second, it is special cased in printing (which is what prompted the discovery in the original linked tweet), in that it doesn't print when printed (as opposed to printing the string "nothing").
The next nothing-y singleton is `missing` (epistemological/statistical nothingness). This value represents that there may be a value, but we don't know what it is. This convention impacts how it propagates, in that most uses of it just return another `missing` rather than error.
In particular, `missing` has the (somewhat questionable distinction) that equality comparisons with missing always return `missing` rather than a boolean, as most other equality comparison would (yes the compiler hates this as much as you'd think it does).
`()` and `(;)` are fairly self-explanatory, but they're worth mentioning. In other systems `()` is sometimes used in much the same way we use `nothing`, but those systems often have a less strong distinction between `x` and `(x,)`.
In #julialang, we don't make this identification, because to us `()` is still a value that is useful in ways that nothing isn't and we don't want to confuse the use cases. E.g. `f(()...)` makes sense, `f(nothing...)` does not.
It is worth mentioning at this point that `nothing` and `missing` are not special (other than that they happened to be defined in the standard library). You are allowed and encouraged to define your own notions of nothingness and packages regularly do.
What other notions of nothingness might you need, well here is a few (fictional, I'm making up the names as I go, there may be better names for these concepts upon deeper consideration):
`unknown` - There was a value, but we don't know what it was (like `missing`, but without the possibility that the value may have been absent/unknowable/`nothing`)
`unavailable` - We don't know the value, but it can be determined (by some computation or by asking a server or database).
`redacted` - There was a value, but the requester is not allowed to know what the value was.
`glomar` - There may or may not be, but the requester is not allowed to know which is the case.
One of the definitive outcomes of the big pre-1.0 "big nothingness debate" (aside from nailing down exactly what `nothing`/`missing` where supposed to mean) was that being allowed to defined your own notion of nothingness is really important (and is super fast).
A few ancillary notes before we move on to other kinds of nothingness. `NaN` (Not a Number) could be considered a kind of nothingness, and it does behave a lot like `missing` in that it propagates, but from our perspective NaN is really just another floating point number.
It is possible that lifting NaN to the type system, e.g. by removing NaN from the Float types and using `missing` (or a proper NaN singleton) where IEEE would use NaN would be a better design, but we follow IEEE closely for consistency here.
Incidentally NaN wreaks havoc with equality, since NaN values can be egal, but not equal, which is unusual and in Julia necessitates the introduction of yet another notion of equality (e.g. for dict keys) - but as I said that's a different topic.
Also, each function (e.g. sin) in julia (other than closures that have captured contents) are the same kinds of singleton objects, we just don't really associate them with nothingness, because they're not used in that way.
Worth nothing that in some languages, rather than having a `nothing` value, not returning a value is an error. This is particularly true for languages that model side effects closely, since a computation that returns neither a value nor has side effects is useless.
In other languages there is a `no value` type, but the type is uninhabited (think `void` in C++, which can be the return type of a function, but there is no `void` value). (More to come shortly, the twitter editor is forcing me to tweet this batch).
Alright, next we can briefly look at the empty *mutable* types. These types have no contents, but do have identity. This doesn't come up super often, but they're useful as unforgeable references to external resources. In some sense they represent pure identity without content.
E.g. suppose a network resource gives you an integer that you need to free when it is no longer referenced. You could just give out the integer, but that's immutable and has no lifetime, so you there's no way to tell whether it's still referenced somewhere or not.
You could wrap the integer in a mutable struct, which gives it identity and lifetime (so you can register a finalizer, etc.), but also makes it forgeable (e.g. by deepcopying it or serialization), so you risk somebody accidentally holding on to a resource you freed.
By giving out an empty mutable token, and having a (weak) lookup table to the network handle, you can be guaranteed that when the lifetime of that token ends, there are no other references to this handle, so you can safely release the network instance.
Alright, moving on to the type system. We have `Bottom` (aka `Union{}`) which is the bottom of the type lattice (least element along the subtyping relation).
It is uninhabited (like `void` in C++), i.e. there is no value `x` such that `isa(x, Union{})` is true. When type inference determines that the type of an expression is `Union{}` this means that the end of the expression is unreachable.
(The expression itself could still be reached, but that would mean that said expressions error). If an expression is Union{} it either errors, is structurally unreachable (there is no path that leads to it) or dynamically (it is dominated by some other Union{} expression).
Fun side note, there is also the special type `typeof(Union{})` (aka Core.TypeofBottom), which is just special for representational reasons, since usually unions always have at least two elements (there is no one element union).
However, the story doesn't end there, because while `Bottom` is the bottom of the type lattice, it is not the bottom of the inference lattice. That would be `NOT_FOUND`, which indicates that we haven't analyzed a given statement yet, so we don't know what it does.
In some ways `nothing` is to `missing` as `Union{}` is to `NOT_FOUND`.
Side note: In some ways `Any` (the top of the type lattice) is also a kinds of nothingness, because it indicates the absence of any specific information. If type of an expression is `Any`, the compiler has basically no idea what it does/will do.
Interestingly, there are parts of the compiler where we explicitly need know whether something could have error'd, so we track that separately. Usually `Any` implies the possibility that the expression could error, but for variable references, we'd like to exclude that.
(Since otherwise we'd have to generate the error path). So when typing variables, rather than expression, the usual lattice elements are taken to exclude `Union{}` and `MaybeUndef` is used to re-introduce that possibility and generate error paths (variables being undef is rare).
Of course all of these are distinct from `Nothing` and `Missing` which are the type of expressions that are known to have the value `nothing`/`missing`. From the type system perspective, those values are not special.
And I think that's basically it for the survey. I may have missed something, but I never promised to be complete. As you might imagine many an hour of thought has gone into this whole setup by many, many people, but I hope you enjoyed it.
Or are at least happy that you didn't have to think about it yourself. The key thing is that while it's important to have a consistent design philosophy here, most people shouldn't need to know the details. For more advanced users, maybe nothing vs missing, but that's it.
You can follow @KenoFischer.
Tip: mention @twtextapp on a Twitter thread with the keyword “unroll” to get a link to it.

Latest Threads Unrolled:

By continuing to use the site, you are consenting to the use of cookies as explained in our Cookie Policy to improve your experience.