OH: Health checks are like bloom filters. A failing health check means a service isn't up, but a health check passing means the service is *probably* "healthy", especially given how health checks are typically done (host level checks or hitting an HTTP endpoint/RPC method).
Which begs the question - how do we even define "health"? For all the talk about embracing failure, partial failures being the bane of distributed systems etc, there doesn't seem to be a way to encode that in health checks, which treats the status as a strictly binary outcome.
Though I guess the folks doing Kubernetes can leverage a "liveness probe" to configure more interesting checks?
Or perhaps not, since a liveness probe failure results in a pod restart.
Or perhaps not, since a liveness probe failure results in a pod restart.