"Discovery" is by far at the most damaging concept is science and, in my opinion, at the heart of all of the poor methodology frustrating so many scientific fields.
Because measurements are inherently unpredictable -- whether you want to interpret that epistemologically as observer resolution or ontologically as noise doesn't matter -- the inferences we draw from them will never be able to inform unambiguous decisions about discovery claims.
There's always _some_ probability that the realized observation fluctuated in a way to mimic phenomena that aren't actually there or obscure phenomena that are. Quantifying these possibilities is one of the key tenets of statistics.
But the also means that statistics is not capable of drawing certain conclusions from observations. No matter what you've been told we can't actually "filter out noise" or the like; all we can do is account for it in our uncertainties.
Critically we have to interpret _all_ statistical methodologies from this perspective. In particular hypothesis testing is all about quantifying the risk that arises when trying to make decisions under uncertainty, but it does not and cannot provide for risk free decisions.
For example even under ideal conditions for null hypothesis significance testing, with high significance and power, a small p-value does not always indicate that the alternative hypothesis is _true_. The procedure quantifies false positives but cannot eliminated them.
Anyone interpreting a rejection of the null hypothesis as a definite discovery will soon enough find themselves stubborning proclaiming erroneous results. This is the reality even under ideal conditions; imagine what the reality is like when methods are sloppily executed.
By far the best way to interpret null hypothesis significance testing is as a method for ensuring that the _population_ of results, i.e. the literature, is reasonably well-behaved. Uniformly high significance and power ensures that most results capture meaningful phenomenology.
But again this quantification of the population behavior says nothing about any particular result. Cultural problems are_ inevitable_ when 5% of the results are expected to be false positives but everyone believes that their result is precious and correct.
Of course when people _need_ to present their results as precious and correct to secure funding and participate in the system at all then robust statistical practice literally becomes impossible.
To be clear I'm using the null hypothesis significance testing framework just as an example; I do not advocate is as useful scientific tool. Calibration of inferences and decisions is critical, but frequentist calibration is typically incompatible with sophisticated modeling.
The main lesson here is that science as an institution will always be inherently problematic when definite discoveries are expected and incentivized.
Moreover I personally have always found that the strength of one's belief in discovery correlates strongly with their capacity as a productive collaborator. If you're transitioning into more formal statistics methodologies it's a great way to assess potential colleagues.
You can follow @betanalpha.
Tip: mention @twtextapp on a Twitter thread with the keyword “unroll” to get a link to it.

Latest Threads Unrolled:

By continuing to use the site, you are consenting to the use of cookies as explained in our Cookie Policy to improve your experience.