Might take a bit, but I want to try to explain (in 2-3 bursts) why I think ATM the ONS Infection Survey is like trying to weigh out cake ingredients on your bathroom scales. This expands on what I wrote here https://www.spectator.co.uk/article/how-worried-should-we-be-about-a-second-wave- but with a bit more maths (but not too much!)
What we want to know is: "what proportion of England's population have the virus right now?". We'll call that number p, for prevalence. Ideally, we'd just test everyone, count the positives, and be done with it. But of course, we don't have the testing capacity to do that.
So instead, ONS do an opinion poll. They pick a large sample of people completely at random and test them: they should be representative of the population. We call the number of people in the sample n. Let's imagine that n is 25000, and prevalence is 0.04%
On average, we'd expect 10 infected people. However, it probably won't be exactly 10: Random Samples Are Random. Luckily we know how random. If we write X for the number of infected people in the sample, X will be bell-shaped and centred on 10.
The fancy way of saying it is that X will be normal with mean (n times p) and variance (n times p). But in practice, what it means is that 95% of the time, the sample will have between 4 and 16 infected people in it. Which is fantastic, except ..
Everything I just said relies on us knowing the prevalence p. But that was the thing we were trying to find out.

But so far, we only used the 1st Year Probability course I lecture, so we'd better try the 1st Year Statistics course I used to lecture.
Imagine we did the experiment, we swabbed 25000 people and 9 of them were positive. (This is about what ONS see at the moment). One lazy thing is we could just guess prevalence is 9/25000 = 0.036%. But this almost certainly isn't right. It's plausible, but how wrong might it be?
There's a trick we can use. We can pivot.
If we knew prevalence p, we had a way of working out what range of values of X were compatible with that. We can flip it round and say "for a given X, what range of values of p are compatible with that?"
For the nerds in the audience, I make it that the smallest plausible value of the prevalence p in this sense is (2+X - 2*sqrt(1+X))/n and the largest is (2+X + 2*sqrt(1+X))/n.

(But I may have messed this up. Don't @ me)
Super-nerds will notice that this isn't symmetric around X/n, which I think is why the top of each blue bar is not in the middle of the corresponding range on these ONS plots.
For example, if I did it right, with 9 positives out of 25000, you get a plausible range of prevalence p being somewhere between 0.018% and 0.069% (nice).

This is quite a big range - roughly a factor of 2 either side. I'll explain why this is a problem in a bit.
So, the point is, we want to use this survey as a way to tell whether prevalence has changed week on week. But, the width of these confidence intervals makes this infection survey a very blunt tool for doing that.
For example, the week after we see 9 infections in the sample, if you would put up a flag if the new estimate is outside the old confidence interval, that requires quite a big jump - you'd need 18 infections .. and that feels like quite a conservative rule to have to apply.
If you only put up a flag when old and new confidence intervals don't overlap, I think you'd need 26 infections in the sample, which is even worse.
Realistically, you'd look at trends, but only getting this data once a week means it might take a fortnight or more to spot anything has changed.

tldr: I think this data is interesting, but now we're at this level of prevalence it won't be the way we spot a second wave.
You can follow @BristOliver.
Tip: mention @twtextapp on a Twitter thread with the keyword “unroll” to get a link to it.

Latest Threads Unrolled:

By continuing to use the site, you are consenting to the use of cookies as explained in our Cookie Policy to improve your experience.