Thread by @jerrinot, 1/ Distributed Systems Evening: Everyone has heard of the CAP theorem. The [...]

Jaromir Hamala

jerrinot

1/ Distributed Systems Evening:
Everyone has heard of the CAP theorem. The infamous "2 out of 3" (sic).

While the CAP result has profound consequences it's not all that surprising: "If a process cannot talk to others then it cannot know what is going on."

2/ The slightly less known, but equally as consequential is the "FLP result". It roughly says this:
"There is no algorithm to guarantee a consensus in an asynchronous network if at least one node may fail."

3/ Asynchronous network means messages can be delayed arbitrary, without any upper bound.

Interestingly enough the result does NOT assume lost messages. Just delayed. Yet you cannot have a consensus algorithm!

That's quite a strong result, isn't it?

4/ There is something what used to bother me in the past:
-FLP: You cannot have a consensus in an async model with just 1 failing process.

-Angry me: What about Raft, what about Paxos and its variants? Explain yourself!

5/ -Hand-waving me: Well, Raft/Paxos they all assume partial synchrony, they really on some assumptions regarding synchrony.

-Smart-ass me: Well, if you assume (partial) synchrony then the problem is trivial. But what if the assumptions are violated? What happens then?!

6/ So if you are like the old me here is a better explanation:
Consensus algorithms assume partial synchrony for progress.

They do not assume that for safety. So delayed messages cannot lead to safety violations.

7/ No matter how much they are delayed if a consensus is reached then everyone see the same value.

In other words: The consensus algorithms get away from FLP by guaranteeing only this: "Nothing bad will ever happen".

8/ But to guarantee "Something good will eventually happen" (=reach the consensus) they require partial synchrony. If messages are still arbitrary delayed then consensus may not ever be reached.

9/ Further resources:
The original paper: https://groups.csail.mit.edu/tds/papers/Lynch/jacm85.pdf
FLP paper walk-through: https://www.the-paper-trail.org/post/2008-08-13-a-brief-tour-of-flp-impossibility/
Paper on partial synchrony: https://groups.csail.mit.edu/tds/papers/Lynch/jacm88.pdf

A Brief Tour of FLP Impossibility

One of the most important results in distributed systems theory was published in April 1985 by Fischer, Lynch and Patterson. Their short paper ‘Impossibility of Distributed Consensus with One …

https://groups.csail.mit.edu/tds/papers/Lynch/jacm85.pdf

You can follow @jerrinot.

Tip: mention @twtextapp on a Twitter thread with the keyword “unroll” to get a link to it.

Latest Threads Unrolled: