This discussion was getting long, so I thought I'd lay out my thoughts on a common argument: should models produce probabilities or decisions? Ie 32% chance of cancer vs "do a biopsy".

I favour the latter, because IMO it is both more useful and... more honest. IMO:

1/13 https://twitter.com/laure_wynants/status/1288131085797294080
The argument against using a threshold to determine an action, at a basic level, seems to be:

1) you shouldn't discard information by turning a range of probabilities into a binary
2) probabilities are more useful at the clinical coalface

2/13
Re: 1.

No model discards information. The continuous output score always exists. It is how you make use of that information at point of care that "changes".

I use airquotes around "changes", because this is a ... false dichotomy 😆

3/13
Model outputs for users are *always* discrete. With a probability from 1-100, you have chosen 100 discrete levels. With one decimal point, you have 1000 discrete levels.

The question isn't discrete vs not, but "how many different decisions could a clinician make?"

4/13
Imagine a model used to decide if a patient is a high risk covid case. The model will guide treatment plans.

So how many treatment plans are there?

Do we think there are 1000? 100?

In almost all clinical tasks, there are 2. Aggressive vs conservative therapy.

5/13
So the question changes. If making medical decision making inherently involves "discarding information" to make binary choices, the question isn't "dichotomise or not", but rather *who* should dichotomise.

Should it be the clinician user or the model developer?

6/13
I get why many statisticians think it should be the user. They are domain experts, with access to other information about the patient.

But this is wrong. The end user is very poorly placed to make this choice.

7/13
Are developers better?

Of course they are! They have a big dataset, so instead of relying on the variable experience and quantification abilities of end users, developers get to use evidence!

A developer chosen threshold is based on data. A user threshold is based on đŸ€Ș

9/13
Ps please don't argue developers make bad choices too. Of course many model developers don't know what they are doing. But *all* clinicians are bad at this.

Clinicians are good at synthesising binary data! Chest pain + troponin > X = heart attack. Each element is binary!

10/13
This leads on to argument 2: that probabilities are more useful.

Not at all! No human can balance a 30% chance of cancer vs a 32% chance of cancer. This is #TMI.

Even in shared decision making, most patients prefer terms like "rare" and "almost certainly" vs 3% or 95%.

11/13
If we look at clinically useful algorithms in widespread use, they are all dichotomous. The Well's criteria and Ottowa ankle rules say "image vs not".

Framingham has 3 risk categories. QRISK >10% is treat with statins. Aspects = "treatment contraindicated or not".

12/13
It is undeniable that common models dichotomise *before the clinician*. This doesn't change with more complicated machine learning models.

And like these simple risk models, we can always provide probabilities post-hoc, like below.

It is just that clinicians won't care.

13/13
PS in the above model, a score >=3 gets you a ct scan.

I suppose I should tag the relevant folks #epitwitter 😁
You can follow @DrLukeOR.
Tip: mention @twtextapp on a Twitter thread with the keyword “unroll” to get a link to it.

Latest Threads Unrolled:

By continuing to use the site, you are consenting to the use of cookies as explained in our Cookie Policy to improve your experience.