Thread by @ilyaraz2, Zero-knowledge (ZK) proofs is one of the most exciting discoveries of CS. [...]

Zero-knowledge (ZK) proofs is one of the most exciting discoveries of CS. Today ZK enjoys renaissance due to blockchain and secure multi-party computation. In a slow thread I'll summarize some definitions and maybe even talk about constructions. Like if you are interested! (1/)

Two disclaimers: 1. My knowledge (pun intended) about ZK is very limited despite having a paper that uses it

2. Even if I get to actual constructions, my focus would be on conceptual simplicity, not practicality (expect to see crazy polynomials etc). (2/)

The main theorem of ZK proofs is: "every language in NP allows a ZK proof". There is a lot to unpack here, let's first talk about NP. (3/)

A language is simply an infinite set L = {x} of finite strings. We say that L is in NP if there is an efficient (poly-time) algorithm A(x, y) with the following property: for every x from L, there is y such that A(x, y) = 1. Putting it differently... (4/)

If almighty Alice and computationally weak Bob has a string x from L and Alice wants to convince Bob that indeed x belongs to the language, she can cook up a certificate y that would convince Bob, who can just run A(x, y) and check that it returns 1. (5/)

In practice, many hard problems lie within NP. My favorite example: the language of math statements of length n, which have a proof of the length poly(n). Indeed, Alice can convince Bob just by sending the proof and Bob can quickly check it, but finding the proof is hard. (6/)

Going back to "NP allows zero-knowledge proofs". It means that Alice can convince Bob that x lies in the language L *without leaking any information about the certificate y*. Before explaining what this means, let's just pause and reflect on how ridiculous this sounds. (7/)

Say Alice knows a fairly short proof of Riemann Hypothesis (RH). Shen can convince Bob that the RH is true, but Bob would have no idea what Alice's proof looks like, yet he'll be 99% sure that Alice is not lying. (8/)

Now let's be slightly more formal. In order to achieve the above "zero-knowledge" property, we first need to relax the verification procedure in a number of important ways, which can be summarized under the label of "interactive proofs". Let's talk about those first. (9/)

1. We allow Alice and Bob to have some back-and-forth instead of just sending a single message. 2. We allow Bob to run a *randomized* poly-time algorithm (Alice is still almighty!). Finally, we require our protocol to have two properties, called soundness and completeness.. (10/)

Completeness means that if the string x lies in the language L, then Alice *can* convince Bob with probability 0.99. Soundness is dual: if x does not lie in L, then whatever Alice does, she won't convince Bob with probability more than 0.01. (11/)

So, looks like we mildly generalized NP. Turns out that the class of languages that allows such an interactive proof is much bigger: in fact, it is any language that can be computed in *polynomial memory* (e.g., who wins in generalized n*n Go etc) (12/)

But the fact that we capture a much bigger class of languages is not the main point here. Let's go back to NP. Our claim is that every NP language allows an interactive proof that is "zero-knowledge" (Alice leaks nothing about the certificate y). Let's now formalize it! (13/)

Soundness and completeness were properties of the Bob's algorithm. Zero-knowledge is a property of the Alice's algorithm... (14/)

...: for every x in L, and for every poly-time randomized algorithm A that Bob can run, there is another poly-time randomized algorithm A* ("simulator of A") that does not talk to Alice yet produces output that is computationally hard to tell apart from the output of A. (15/)

So, informally: anything that Bob can learn from interacting with Alice, he could have computed himself. This sounds fishy: if Bob does't need Alice, why can't he decide if x is in L? The answer is that we require the simulator to work only on the strings from our language. (16/)

This concludes the explanation what does the theorem "NP allows ZK proofs" mean. I need to take a break now (meditate, work-out, get some breakfast), then I might continue with some constructions of ZK proofs. Let me know if you made it until here! (17/)

OK, I'm back! Let's now build zero-knowledge interactive proofs for all the languages from NP. (18/)

Instead of handling every NP language separately, we will use the notion of NP-completeness. Turns out, there are languages in NP (NP-complete) such that every other language in NP can be reduced to those in poly-time. More formally, (19/)

L is NP-complete if for every L' in NP there is a poly-time algorithm f such that x in L' iff f(x) in L. If some problem is NP-complete, then it can be seen as an evidence that it's hard (indeed, if it was easy, the whole NP would be). (20/)

E.g., the language I introduced above. There are relatively few natural examples of problems in NP that are neither NP-complete nor a poly-time algorithm is known for those (factoring, graph isomorphism etc), so NP-complete problems are abundant. (21/) https://twitter.com/ilyaraz2/status/1343261585435807744

https://twitter.com/ilyaraz2/status/1343261585435807744

Returning to our zero-knowledge quest, it is enough to construct a zero-knowledge proof for *some* NP-complete problem. This is almost obvious, but there is a little wrinkle if you try to prove it rigorously: we'll need our ZK proof to have a little additional property (22/)

I'm not going to introduce this property here, since it does not matter much, but I want to emphasize that rigorous proofs involving ZK are surprisingly subtle (and even "trivial" facts need to be checked very carefully, we'll encounter one more such fact later). (23/)

That's why I'm trying to lay out the foundations before diving into the examples (unlike most of the existing "informal" expositions of ZK). (24/)

OK, so which NP-complete problem are we going to construct a ZK protocol? There are two standard examples: protocols for 3-colorability and hamiltonicity. Both are combinatorial properties of an undirected graph. (25/)

Graph G is 3-colorable, if we can color its vertices into three colors such that no edge connects two vertices of the same color. G is Hamiltonian if there is a cycle that passes through every vertex exactly ones. Both problems are NP-complete (26/)

I will describe the protocol for 3-colorability, but the protocol for hamiltonicity is also very useful (e.g., it can be massaged into so-called non-iteractive ZK proof) (27/)

So, where are we at? Alice and Bob have a graph, and Alice wants to convince Bob that it is 3-colorable without revealing anything about the coloring. Bob wants to be convinced if only if the graph is indeed 3-colorable. (28/)

We'll need a cryptographic primitive, which is called a "commitment scheme". Say Alice wants to commit to a value x. First, she and Bob invoke the "commit phase", during which Bob learns nothing about x. Then, they may want to invoke the "reveal phase"... (29/)

... where Alice sends x and her private randomness to Bob. Bob can verify that x is indeed the value Alice committed to. In practice, Alice can do this by sending H(x || randomness), where H(.) is a sufficiently strong cryptographic hash functions. In theory, ... (30/)

... commitment schemes can be built from one-way functions via cryptographic pseudo-random generators. (31/)

OK, now that we have commitment schemes, let's finally build a zero-knowledge proof of 3-colorability. (32/)

Alice relabels the coloring (1 -> pi(1), 2 -> pi(2), 3 -> pi(3), where pi is a random permutation of {1, 2, 3}) and commits to colors of all vertices. Bob chooses a random edge and asks Alice to reveal the colors of endpoints. Bob accepts iff the colors are distinct. (33/)

Let's understand why this is a valid protocol. The completeness is trivial: if Alice indeed knows a valid coloring, she can act in a way that will convince Bob with probability 1 (like I described above). (34/)

Soundness readily follows from the definition of a commitment scheme. Since Alice must have committed to *some* coloring, if the graph is not 3-colorable, then it will be discovered with probability at least 1/m, where m is the number of edges. (35/)

This is not so great, since we want Bob to catch Alice with probability 0.99, not 1/m. But it can be remedied by repeating the protocol 100m times one after another, and this amplifies the probability as you would expect. (36/)

Such sequential repetition behaves nicely wrt completeness and soundness, that's easy to show, and intuitively it should preserve zero-knowledge, but that, again, turns out to be more subtle than expected, but can be done with some work in spirit of https://twitter.com/ilyaraz2/status/1343313236595146753 (37/)

https://twitter.com/ilyaraz2/status/1343313236595146753

Finally, it is left to show that our coloring protocol is indeed ZK. Intuitively, it's obvious: all Bob sees are two random colors, which reveal no information about the coloring, however the formal proof is way more subtle. Let's get some flavor of the main difficulties. (38/?)

Let's first recall the formal definition of zero-knowledge. For every Bob's algorithm, there is another algorithm (simulator) that does not interact with Alice yet whose output is hard to tell apart from the true Bob's. (39/?)

Here is the simulator for our protocol:

1. Impersonate Alice by committing to absolutely random colors (of course, not a valid 3-coloring, which we don't know anyway)
2. Run Bob's algorithm and figure out which edge he'd want to query (one important remark: ... (40/)

Bob does not have to query a *random* edge, because our simulator has to handle an "arbitrary Bob", in fact this choice can in principle depend on particular way commitments were made. That's exactly one significant source of difficulty for proving ZK.) (41/?)

Going back to the simulator: after we receive the Bob's query, we reveal the corresponding two colors. If they are the same, we declare the simulation to fail. If they are different, we let Bob continue to operate and output whatever answer he meant to output. (42/?)

There are two immediate questions here:

1. Why on earth are we allowed to fail during the simulation?
2. What's the probability of the said failure? (43/?)

1. Failing is OK as long as the probability of that is not too high (<= 1/2). We can obtain the legit simulator that almost never fails by calling the faulty one until it does not fail. We need to make sure that the output distribution is good *conditioned on not failing*. (44/?)

2. Now, to the probability of failure. If Bob's query was independent of choice of colors, then it would have been at most 1/3. However, in reality, it is not quite the case, since Bob can see the commitments before querying... (45/?)

... here we need to use particular definition of the security of our commitment scheme and argue that we can assume wlog that Bob's query is *essentially* independent of the colors, so this probability is at most 1/3 + eps. (Remember that Bob is a poly-time algorithm) (46/?)

This essential independence can be pushed further to show that this simulator indeed is correct (when it does not fail). The additional difficulty is that Bob's output can depend on a combination of commitments and revealed secret colors. (47/?)

Phew, this was long, sorry! ZK is hard.

This beautiful theorem (from 80s) is just the beginning of a beautiful field. Some further directions that got partially answered:

1. Can we make ZK practical?
2. Can we make it few-round?
3. What are the cool applications of ZK? (48/?)

This is it for now! If someone wants to follow-up on this thread by talking about more modern constructions or implementations, it would be absolutely amazing! (49/49)

Latest Threads Unrolled: