Thread by @NoahHaber, Getting close to finally writing this paper:1) The obs/RCT dichotomy is false [...]

Getting close to finally writing this paper:

1) The obs/RCT dichotomy is false and harmful
2) RCTs are obs studies w/ VERY plausible exogenous variation in X
3) All practical CI studies (RCTs too!) have hypothetical impossible target trials.
4) ^ leads to a universal framework https://twitter.com/NoahHaber/status/1331227887878463489

https://twitter.com/NoahHaber/status/1331227887878463489

It's strange that we have nearly completely separated frameworks for thinking about RCTs and obs studies, when they're really playing in the same playground.

When we hold them to different standards, we give a lot of issues a "pass" that we shouldn't.

The key here is that what is *feasible* is completely irrelevant to strength of causal evidence. Weak studies don't get better just because a better study isn't possible; they're still weak studies.

The good news is: we more or less already have the tools to think about this.

In effect, this is a slight generalization of the target trials framework.

The top-line difference is that we can apply the target trials framework to any and all in-practice CI, including RCTs.

After all, no RCT perfectly answers the question its authors want it to either.

So, we consider an RCT as simply an observational study in which we have forced the variation in X to be plausibly exogenous.

We figure out what hypothetical trial the study was trying to inform (usually based on a decision), and apply that standard to the study.

Up to this point, you may be thinking I am trying to "diminish" RCTs relative to obs studies. Couldn't be further from the truth.

Because now we get to hold all studies to RCT standards too. And boy howdy that's a tough standard if you don't get to randomize X.

The current state of observational research is extremely poor, and when you hold it to a reasonable universal standard on the same scale as RCTs, that becomes immediately clear.

But of course, current standards aren't fixed; they can and should change.

Step 1 for reviewing (and generating) any and all causal inference is figuring out what idealized question you want to inform is.

Step 2) Consider the hypothetical target trial for that study, which is *independent of feasibility* (you can spawn universes if ya like)

Step 3) Determine what assumptions separate the study *in practice* from the study *in theory*

<note: there are practical ways to do this, adapting from existing review tools and CI evaluation methods>

Step 4) Evaluate the plausibilty and importance of those assumptions.

Worth noting that this isn't some brilliant new insight that I am claiming as mine; many others have had similar ideas and proposals.

At best, this is just combining existing theoretical thought from a few fields into a slightly different practical configuration (notes irony).

Proposed catchy title: "The glitter standard: Toward a unified framework for evaluating strength of causal inference"

Latest Threads Unrolled: