This isn't overall correct. Lots of what are casted as insurmountable problems are basic modeling questions. If they are actually completely ignored, then ya, that shop has a bad model, e.g. the shops in 2016 that assumed independent errors across states. https://www.nytimes.com/2020/11/01/opinion/election-forecasts-modeling-flaws.html
538 releases calibration plots for all of their out of sample future forcasts. 538's shop makes predictions that are on average actually very well calibrated. It's a shop level rather than model level reputation, but their performance isn't unknowable. https://projects.fivethirtyeight.com/checking-our-work/
There's also nothing uniquely broken about having n=12 elections. For starters, they forcast states so 600+ polls->outcomes, with of course some multilevel nesting within year. They also forcast house, senate, and governor races, which produces shop level priors.
"polling data we have is relatively sparse.' is just false. There are 25 different polls of Florida right now. Lack of measurement at least today isn't the problem, although it gets sparser the further back you go.
In fact, fundamentals is used as a very weak prior exactly when polls are sparse. By election day I believe its weight is reduced to zero or nearly zero.
"Even small shifts like that matter greatly" is true for only close elections. Nate keeps talking about 1 or 2 standard deviation sized typical errors in the polls. In 2016, Trump was only 1 typical standard error away from winning, the forcast explicitly says what you're warning
Endogeneity is also overblown. If forcasts didn't exist hack political reporters still would. It's not like Comey couldn't read the USA Today and reach the same rationalization. Coming out or staying home based on how close the race is ubiquitous
The alternative is kind of insane. Let's be less systematic in how we evaluate polling and voting on the off chance less informed voters become more likely to turn out. There's clearly signal here. There's clearly people working hard to extract it for us.
The solution is to make people more sophisticated at evaluating forecasts so they in turn demand more transparency and performance evaluations across shops. Sowing general distrust is a poor substitute for specific methodological critiques and solutions.
We should be so lucky to have the social sciences start systematically documenting even our trash in-sample performance like good forcasters routinely do for their future genuinely out of sample performance.
2016 was special because so many cheap copycats of 538 sprung up, NYT, Wang, etc. with aweful one time >99% clinton forecasts. Sturgeon's law- 90% of anything is crap applies. Skeptical should be the default all the time- and then there are formal ways for a model to earn trust.