I had a statistical conniption about the Aberdeen study published in BMJOpen earlier today. Let's see if I can explain my concerns in a less techie fashion. The gist of the article is that countries with more flights arriving had more deaths. That so? 1/n https://www.bbc.co.uk/news/uk-scotland-north-east-orkney-shetland-55919040
The article, which you can read at https://bmjopen.bmj.com/content/bmjopen/11/2/e042034.full.pdf relies on the following scatterplot to make its point. Indeed, there does appear to be a relationship between arrivals and (logged) daily death rates. 2/n
That graphical presentation is supported by a multivariate statistical model, which shows that the result is robust to including lots of other confounds - (log) population, income, age profile, health status, density, etc. So far so good. 3/n
But... eagle-eyed viewers might have looked at that graph and thought - wait a second, countries with lots of international arrivals probably have large populations and they might also have lots of people dying from COVID every day. You know, because they're large. 4/n
Now the statistical model has a control for population size - that means it's looking at the relationship between the number of arrivals and the number of deaths, netting out the 'effect' of population. That feels ok... but... 5/n
Wouldn't it just make more sense to normalize flights and deaths per head of population? That way we really have measures of risk of death and how internationally connected a given country is. And once we do that... 6/n
Psyche! The relationship is still there. In a bivariate sense. Good news. Except, now when we try and control for other things, such as income or region, the whole thing blows up. Let's see that 7/n
Let's begin with income. If you look at the graph I just showed you'll notice the kinds of countries that are towards the top-right of the graph are wealthy and those towards the bottom left are poorer. So let's see what happens if we split the sample into four income groups. 8/n
Now we see that poorer countries (top left) have low numbers of flights and low numbers of deaths. Among richer countries, flights and deaths are generally higher. But moreover, there's no real relationship between flights and deaths except in Box 3. 9/n
I lied earlier when I said there wouldn't be any maths. Let me just show you the difference between a simple linear regression with just flights per capita (left) and another that includes GDP per capita as well (right). You see the *** thingies next to flights? They vanish. 10/n
The same is true if we try and take global region into account. We add an indicator for Europe, Asia, Africa, N America, and S America. We can do a graph... 11/n
Or we can run one of those linear regression thingies with an indicator for each region. Either way, the flights variable basically loses all its power to explain deaths. 12/n
I went into much more (well somewhat more) technical detail earlier (see this thread, which will also allow you to read my code). But I think, given how much press this study got, it's important to try and show things simply. 14/n https://twitter.com/benwansell/status/1357343017900810250?s=20
The moral of the tale is that the press should be very careful when reading statistical papers with vaguely plausible results but very little data. We really can't say anything for sure from this study and I certainly wouldn't use it to advocate closed borders. 15/n