Okay, while I'm not saying that MRI methods are ready to be clinical-level biomarkers yet, I think this study (and everyone else) needs to slow their roll on the conclusions.
PDF: https://journals.sagepub.com/doi/pdf/10.1177/0956797620916786?casa_token=UIPjXcEb-UIAAAAA:cavn9LGRCvgYCaEj7YHuK3rPXuwMLMtlQdYnxbwyb5t9J-h1RSpG93PmJiNgoMt3IzlyaLG7Y19x
THREAD 1/11 https://twitter.com/CT_Bergstrom/status/1277060876386721792
PDF: https://journals.sagepub.com/doi/pdf/10.1177/0956797620916786?casa_token=UIPjXcEb-UIAAAAA:cavn9LGRCvgYCaEj7YHuK3rPXuwMLMtlQdYnxbwyb5t9J-h1RSpG93PmJiNgoMt3IzlyaLG7Y19x
THREAD 1/11 https://twitter.com/CT_Bergstrom/status/1277060876386721792
1) The inter-class correlation (ICC) when comparing task-linked BOLD responses across *all* voxels in the brain is 0.397. Since only a subset of voxels are actually relevant (i.e., task related using a statistical test) for given task, then this shouldn't be that surprising. 2/11
Activation in sensory areas will be *highly* unstable across runs/sessions/scanners/centers because they are modulated by subtle variability in inputs (e.g., brightness) and other things like attention. Noise voxels will just be noise.
This will all pull down the ICC.
3/11
This will all pull down the ICC.
3/11
In fact, the study reports that the average ICC for when thresholded voxels are used (i.e., only looking at those voxels deemed relevant for the task) increases to 0.705.
So when voxels are engaged in a task, those voxels tend to be pretty reliable!
4/11
So when voxels are engaged in a task, those voxels tend to be pretty reliable!
4/11
2) For the region analysis, the authors use "thresholded" maps (Fig. 4), i.e., t-tests. There are 2 ways a t-test can fail to be "significant": too small a numerator (mean), too large a denominator (variability). We have no idea which is causing the variance in this figure. 5/11
Within a region there might be stable and unstable voxels. This study failed to show voxelwise stability maps & instead base their conclusions largely off of thresholded activation patterns. It's not possible to know if the variability here is uniform and to what degree. 6/11
3) Studies that actually try to build predictive models from task related activity do not weight *all* voxels equally. Some are relevant for predicting individual differences, some aren't. Assuming that low ICC across all voxels means biomarkers are infeasible is naive. 7/11
If fMRI measures were so unreliable as claimed, then we wouldn't be able to predict individual differences using cross validation. But we can!
For example:
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5008686/
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5634271/
https://www.jneurosci.org/content/31/2/439.short
(review here)
https://www.biologicalpsychiatryjournal.com/article/S0006-3223(20)30111-6/pdf
8/11
For example:
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5008686/
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5634271/
https://www.jneurosci.org/content/31/2/439.short
(review here)
https://www.biologicalpsychiatryjournal.com/article/S0006-3223(20)30111-6/pdf
8/11
In fact, there are many techniques used in our field (e.g., crossnobis estimators, PCM, any model with proper cross validation) that just wouldn't be feasible (but are) if fMRI was so unreliable.
This is all but ignored in the conclusions.
9/11
This is all but ignored in the conclusions.
9/11
4) Not all tasks are the same. A lot of experimental design factors can determine when a task is or isn't relevant for predicting individual differences.
(e.g., https://academic.oup.com/scan/article/doi/10.1093/scan/nsaa050/5821247?searchresult=1)
Crappy tasks have low reliability, regardless of whether they're done in an MRI.
10/11
(e.g., https://academic.oup.com/scan/article/doi/10.1093/scan/nsaa050/5821247?searchresult=1)
Crappy tasks have low reliability, regardless of whether they're done in an MRI.
10/11
In summary:
Do the authors raise valid concerns that we have to address as a field? Yes.
Do their data invalidate the feasibility of fMRI as a marker of individual differences. Absolutely not.
11/11
Do the authors raise valid concerns that we have to address as a field? Yes.
Do their data invalidate the feasibility of fMRI as a marker of individual differences. Absolutely not.
11/11