Imagine if we optimized for number of independent replications over number of citations.
A citation typically assumes the cited study is true. In a paper that cites 50 other papers, most of them are not being tested.

This leads to chains of unsupported inference, as in the replication crisis.

(Note: I have not replicated this graph!)
https://www.nature.com/news/1-500-scientists-lift-the-lid-on-reproducibility-1.19970
Think about the reproducibility problem in software first. This is much easier than (say) biological science because it is usually cheap and fast to independently check whether software works on any given computer. It’s still not easy to make code work reliably everywhere.
In biomedicine, running a replication is time-consuming and expensive.

There are interesting companies like Transcriptic and Emerald Cloud Labs that have been working on AWS for biomedicine to speed this up.
Ideally:

1) A study’s PDF should be generated from code & data, as per reproducible research

2) All of that (code, data, PDF) should be not just open-access but *on-chain* to give a set of truly permalinks, and to ensure data/code are timestamped https://www.coursera.org/learn/reproducible-research
3) Now subsequent on-chain studies can not just cite previous studies, they can *import* them, much like defi composability but for scientific results.

4) Each author then serves as a (potentially trustable) oracle, posting code and data on-chain with a digital signature.
5) Obvious caveat: data itself could be wrong. This is true, and more on this shortly.

But note that even full access to all code/data reported by each study in your information supply chain would be valuable.

In software, we can view source for all open-source dependencies.
6) So, what if data is wrong? As per earlier part of thread, methods sections of papers can be designed for the most mechanical possible independent replication of the dataset.

Like a blog post estimates “time to read”, we can estimate a study’s “cost & time to replicate data”.
7) For example, given a mechanical protocol for reproducing the dataset underpinning a study:

- Spend X to replicate at CRO
- Spend Y to replicate at core facility
- Spend Z to replicate by re-running surveys at Mturk

And so on. Package studies for independent reproducibility.
8) Again, in computer science, we already do this. How much does it cost to replicate GPT-3 on AWS? These folks estimated it was ~$4.6M about six months ago, a cost that has likely dropped.
https://lambdalabs.com/blog/demystifying-gpt-3/
9) Why is replication important?

Because science isn’t about prestigious people at prestigious universities publishing in prestigious journals echoed by prestigious outlets

That’s how we get “masks don’t work, now they do, because science”

Science is independent replication.
10) Some of the people at those prestigious institutions are legitimately very intelligent and hard working, capable of discovering new things and building functional products.

But on the whole, the *substitution* of prestige for independent replication isn’t serving us well.
11) Civilization has a ripcord, a glass breaking approach people use when centralized institutions get too ossified. It’s called decentralization.

Luther used it to argue for the “personal relationship with God”, disintermediating the Church. Washington used it. So did Satoshi.
12) Given how many hunches have been marketed to us this year as science (“travel bans don’t work, the virus is contained”), the emphasis on independent replication as the core of science is just such a ripcord.

Only trust as scientific truth what can be independently verified.
You can follow @balajis.
Tip: mention @twtextapp on a Twitter thread with the keyword “unroll” to get a link to it.

Latest Threads Unrolled:

By continuing to use the site, you are consenting to the use of cookies as explained in our Cookie Policy to improve your experience.