Thread by @lennybronner, Something I've been thinking a lot about since election night is state [...]

Lenny Bronner

lennybronner

Something I've been thinking a lot about since election night is state fixed effects in election-night models. I have a lot of digging into data and results ahead of me but here are some initial thoughts:

1) In backtesting, state fixed effects helped a lot towards the middle of the night but hurt us earlier in the night.

It hurt because it adds a lot of model complexity. Without fixed effects our model has ~7 covariates, with state fixed effects it has ~58.

It helped once we saw enough counties because it turns out that within-state swing is real and can be very informative. Also, it helps sort out state-specific idiosyncrasies (e.g. Florida).

2) We had state fixed effects turned on for most of election week (while the model was on). This forced us to wait until at least one county was fully reported before turning the model on for a state.

3) However, we ended up turning them off because we noticed that we were overfitting our model within states. The way we realized was that our confidence intervals for Philadelphia and Allegheny County kept on collapsing.

There are no other counties in Pennsylvania like them. And by using state fixed effects, our model was unable to learn from similar counties outside of Pennsylvania. Once we stopped using them, other urban counties were informative enough.

Our confidence intervals obviously expanded once we stopped using state fixed effects (both because of increased uncertainty and because our Philadelphia and Allegheny CIs were no longer collapsing). But we saw this as a net positive.

I think our main initial takeaways are this:
1) state fixed effects are good but not worth it in production
2) waiting until a state has *at least* one county finished reporting *is* worth it.

You can follow @lennybronner.

Tip: mention @twtextapp on a Twitter thread with the keyword “unroll” to get a link to it.

Latest Threads Unrolled: