Okay, while I'm not saying that MRI methods are ready to be clinical-level biomarkers yet, I think this study (and everyone else) needs to slow their roll on the conclusions.

PDF: https://journals.sagepub.com/doi/pdf/10.1177/0956797620916786?casa_token=UIPjXcEb-UIAAAAA:cavn9LGRCvgYCaEj7YHuK3rPXuwMLMtlQdYnxbwyb5t9J-h1RSpG93PmJiNgoMt3IzlyaLG7Y19x

THREAD 1/11 https://twitter.com/CT_Bergstrom/status/1277060876386721792
1) The inter-class correlation (ICC) when comparing task-linked BOLD responses across *all* voxels in the brain is 0.397. Since only a subset of voxels are actually relevant (i.e., task related using a statistical test) for given task, then this shouldn't be that surprising. 2/11
Activation in sensory areas will be *highly* unstable across runs/sessions/scanners/centers because they are modulated by subtle variability in inputs (e.g., brightness) and other things like attention. Noise voxels will just be noise.

This will all pull down the ICC.

3/11
In fact, the study reports that the average ICC for when thresholded voxels are used (i.e., only looking at those voxels deemed relevant for the task) increases to 0.705.

So when voxels are engaged in a task, those voxels tend to be pretty reliable!

4/11
2) For the region analysis, the authors use "thresholded" maps (Fig. 4), i.e., t-tests. There are 2 ways a t-test can fail to be "significant": too small a numerator (mean), too large a denominator (variability). We have no idea which is causing the variance in this figure. 5/11
Within a region there might be stable and unstable voxels. This study failed to show voxelwise stability maps & instead base their conclusions largely off of thresholded activation patterns. It's not possible to know if the variability here is uniform and to what degree. 6/11
3) Studies that actually try to build predictive models from task related activity do not weight *all* voxels equally. Some are relevant for predicting individual differences, some aren't. Assuming that low ICC across all voxels means biomarkers are infeasible is naive. 7/11
In fact, there are many techniques used in our field (e.g., crossnobis estimators, PCM, any model with proper cross validation) that just wouldn't be feasible (but are) if fMRI was so unreliable.

This is all but ignored in the conclusions.

9/11
In summary:

Do the authors raise valid concerns that we have to address as a field? Yes.

Do their data invalidate the feasibility of fMRI as a marker of individual differences. Absolutely not.

11/11
You can follow @tdverstynen.
Tip: mention @twtextapp on a Twitter thread with the keyword “unroll” to get a link to it.

Latest Threads Unrolled:

By continuing to use the site, you are consenting to the use of cookies as explained in our Cookie Policy to improve your experience.