One thing I do know after US election poll results don’t match the actual results for the second time:

We need to better understand Missing Data. There are 3 types of missing data in polls and surveys - let me walk through them /1 https://www.theatlantic.com/ideas/archive/2020/11/polling-catastrophe/616986/
Firstly - a note about sampling and uncertainty. All polls and surveys are samples from the whole population. Pollsters work hard to ensure their samples are properly representative of the whole population, and can also apply weights to their results to make up for groups who /2
May be hard to recruit for the poll, so sampling is one thing. But missing data is different. Republicans who are contacted by pollsters, but who refuse to declare who they will vote for, or refuse to talk to the pollster, are in a sense, “cases” with missing data. /3
There are three types of missing data: (1) missing completely at random, (2) missing at random, and (3) missing not at random.

The first two can be overcome using statistical methods, the last one is a problem. /4
Data that are missing completely at random (1) is exactly as the name suggests - the missing data is totally random (eg phone lines go down across a state), and so the remaining data can be used to predict the overall result /5
Data that are Missing At Random (situation 2) means that people’s responses are missing, but we can predict their responses using information that is included in the rest of the dataset. Regression models for instance can be used to create a “best guess” for what’s missing /6
Lastly, data that are Missing Not At Random, is the most difficult. In this situation, the *reason* the data is missing is due to the *response* itself. A classic example might be asking people what their income is - people with very high incomes might refuse to say, which /7
Means that there is a Not Random component to the pattern of missing data. Another example is asking people what their weight is - overweight ppl may decline to answer due to stigma. In these cases it isn’t easy to use the data that’s left to estimate the missing responses /8
The US polling is failing them, and one reason is likely to be data that is Missing Not At Random. People decline to respond to polls because of stigma of voting Trump. Missing data affects most quantitative research when humans are involved. Reporting missing data helps. /end
You can follow @DrJinRussell.
Tip: mention @twtextapp on a Twitter thread with the keyword “unroll” to get a link to it.

Latest Threads Unrolled:

By continuing to use the site, you are consenting to the use of cookies as explained in our Cookie Policy to improve your experience.