Thread by @michael_nielsen, How often are numbers prime?Early on, there's a lot of prime numbers: [...]

How often are numbers prime?

Early on, there's a lot of prime numbers: 2, 3, 5, 7, ... But they thin out as you get to larger numbers. It gets "harder" for numbers to be prime, & "easier" for numbers to be composite; there's a kind of combinatorial explosion of composites

Suppose we define pi(n) to be number of prime numbers <= to n. Then a very famous result known as the prime number theorem says that for large n, pi(n) approaches n/ln(n), where ln is the natural logarithm.

This is an amazing fact!

It tells you that while the primes do thin out, they thin out very slowly - logarithmically slowly.

At n = 1,000, about 1 in every 7 numbers is prime; by n = 1 million it's 1 in every 14.

This is important for many reasons. One is modern cryptosystems often make use of very large primes. The approach taken is, I understand, to generate very large numbers at random, & then check until you find one which is prime(!) This only works quickly if primes are quite common

The prime number theorem is usually thought of as pretty hard to prove. Early proofs were long and arduous and detailed. Modern proofs are easier, but I've never seen one that seems easy.

But there is a pretty easy & in my opinion very beautiful & enlightening way of seeing that the density of primes is at least logarithmic.

(This thread based on a discussion in Papadimitriou's book on Computational Complexity, and on conversations with @LauraDeming, @kanjun, @3Blue1Brown, & @SebastienZany . I don't know where Papadimitriou got it. )

To explain the key idea I need some notation for the binomial coefficient n choose k. Twitter makes the standard notation hard, so I'll just use (n|k). It seems good enough.

As with many proofs, it takes a little while to grok, but ultimately is very beautiful & natural!

Lets look at a few details, then turn to build up a more intuitive sense of why & how you could have discovered this

There are two ideas in the proof. First is that (2n|n) is in some sense "large", or grows very quickly with n. In particular, that (2n|n) >= 2^n.

Second, for any prime p the prime power dividing (2n|n) cannot be larger than log_p(2n). This is in some sense a very "small" power - much smaller than it conceivably could have been given that (2n|n) rises exponentially with n.

How can we build up a very large number from a product of small primes to small powers? The only possible way is if there are a _lot_ of distinct primes!

A little more formally, from our two facts we have 2^n <= (2n|n) <= product_{p <= 2n} p^(log_p(2n)).

The right-hand side is just (2n)^pi(n), and taking the log to base 2 gives: pi(2n) >= n/log(2n)

In other words: (2n|n) has lots of small prime factors, raised to small powers, but gives rise to a big number, and this forces the density of primes to be pretty high!

Alright, how can we prove the two facts I used?

Here's several proofs of the first fact: induct on n (two lines of algebra); use Stirling's formula (gives you a stronger result, actually). And here's my favorite proof:

Notice that (2n|n)/2^n is the probability of n heads in a sequence of 2n fair coin tosses. That's greater than the probability of any other possible number of heads, so (2n+1) (2n|n)/2^2n >= 1

(Equivalently, (2n|n) is the largest coefficient in the binomial series for (1+1)^2n)

Thus (2n|n) >= 2^(2n)/(2n+1), which is quite a bit stronger than my original claim, and gives an even better bound on the density of primes.

A fun challenge: find a fourth proof, different than any of the above. There are many more, and each will of course deepen your understanding!

What about proving the second fact, about the power of prime divisors of (2n|n)? This is a little tricky to write on twitter - it's much easier visually, at a whiteboard - although it's really not so challenging. So I'll just sketch - you need to fill in details at a whiteboard

A good approach is to focus on the case p = 2, and understanding the powers of 2 that divide (2n|n) = (2n)*(2n-1)*...*(n+1) / n*(n-1)*....*1.

In particular, if you reflect for a bit, you'll see that the denominator has quite a few multiples of 2: 2, 4, 6, 8,... And the numerator also has quite few multiples of 2: n+1, n+3, n+5... (if n is odd, change in the obvious way if n is even).

In a similar way, the multiples of 4 in the denominator are 4, 8, 12, ..., and in the numerator are things like n+1, n+5, n+9,.... (depends exactly on value of n mod 4).

In fact, for each power of two the numerator and denominator have almost exactly the same number of multiples of that power of two. These all cancel out, but occasionally the numerator will have one "extra". Counting these up is what gives the bound on the prime power.

Apologies for not being more precise here - I need to either be very pedantic & long-winded, or go to a more visual or notationally oriented proof, neither of which is well suited to Twitter.

So: where did this proof come from? How could you possibly have discovered it? I think there's a few ways you could have discovered it.

One is that you might just have been fooling around with questions like "What is the prime factorization of n!"? If you use that as your starting point you start to get some (very bad) bounds on the density of primes.

But you might think: oh, can I improve it? And one way is to cancel out some factors in n!, without making it too much smaller. So it makes sense to consider things like (n|2), (n|3) and so on. Pretty soon, you'd twig: oh, we're getting a lot of corresponding cancellation!

At that point you'd be away, and considering things like (2n|n) start to make a lot of sense, and you'd be getting great bounds on the density of primes.

Can we sum up in the proof in a sentence or two?

Let's try one very rough approach.

The number (2n|n) grows exponentially quickly with n, but all its prime powers are small, because there is so much cancellation between numerator & denominator, leaving only room for logarithmically many powers of any prime. This forces at least a logarithmic density of primes

It should be possible to do a _lot_ better, far more compressed and beautiful. And to also express much more crisply (and, in some cases, visually or through other representations) other ideas in the proof. Still, I think this is not bad for Twitter!

More good questions: what forces you to (2n|n)? Are there other good choices? Can we find a better function? I can't, but found the search quite illuminating. I _can_ improve the bound on the prime powers a bit, though haven't figured out a really elegant expression.

Something I don't know is whether this approach or something similar can be pushed through to give the pi(n) ~ n/ln(n) result. That'd be nice!

Something I love about this argument: you start off with a very fundamental question about the prime numbers - the atoms that make up the numbers. And by thinking about it you can say something astounding about how often they occur, with complete confidence!

Notice also just how shocking the result is. In some sense it proves that there are "lot" of prime numbers, forever. If you think about that, at first proving it seems hopeless: how could you get a handle on it? But once you start to get a little leverage....

Actually, on that leverage point: if you come at this problem directly it's hard to see a leverage point. But simply due to curiosity you may wonder about questions like "How many prime factors does n! have?" And when you do you generate tiny bits of insight, & sorta bootstrap...

Latest Threads Unrolled: