"There are only two hard things in Computer Science: cache invalidation and naming things." - Phil Karlton
"Naming things" is obvious, but why "cache invalidation"? It's hard because it implicitly conflates two distinct notions of time: *physical* time and *logical* time.
"Naming things" is obvious, but why "cache invalidation"? It's hard because it implicitly conflates two distinct notions of time: *physical* time and *logical* time.
Physical time is what we normally think of: wall time, how long something takes to run. Physical time is a constraint we want to minimize. Logical time is how the system changes: if the system changes state, each state of the system exists in different logical time "eras".
Logical time is isomorphic to state change. If the system's state doesn't change, physical time may be passing, but logical time is not. And there's no way to tell if time has passed except by noticing state changes. Time and state are the same thing.
(This isn't just true with mutable state in imperative languages. Any system which produces an observable change in either queries to it or to the external world during its execution has time, and therefore state. It's just via a higher layer of abstraction.)
Most programmers, most of the time, are writing things where only one of physical or logical time is "highly relevant". But with caches, both are highly relevant. The only reason we have caches in the first place is to reduce the physical time a query takes to complete.
But now we have two pieces of states, the cache and the storage. If either changes, we're in a new era. If the storage changes first then we have an era where the cache is incorrect, and we want to minimize the *physical time* to the next logical tick
But the system is concurrent, so we can't "stop" logical time from passing. We have to juggle minimizing physical time for invalidation with minimizing the physical time querying storage (aka why you have a cache in the first place) while watching out for logical time passing
(If "storage" doesn't have state, like with memoizing pure functions, this whole problem goes away and you're left with invalidation to deal with memory constraints, which is a hard problem but a hard problem in a different way)
Worth noting that Karlton said this circa 1995, so may have been thinking of caching in a different context than I am now, but the same essential complexity of mixing physical and logical time continues to persist as a hard problem in computer science https://skeptics.stackexchange.com/a/39178