@sherbino This is one of my all-time favourites. My thoughts: raters (consciously or unconsciously) choose to pay attention to specific aspects of trainee performance when formulating a judgement/assessment. Some of these are task-relevant (e.g., tissue handling) https://twitter.com/sherbino/status/1274042250260070402
And some are task-irrelevant (e.g., how they performed on yesterday’s difficult c/section). But they can’t focus on EVERYTHING because our minds are not capable of doing so.
We design assessments in a certain way, to try and direct raters to specific aspects of performance that we have (in some imperfect way) decided are important. But we know that they don’t always follow that guidance and use their own mental models.
@drjfrank mentioned that with many raters and many assessments, the variability comes out in the wash (paraphrasing) and that makes sense. There will be overlap between raters’ mental models and some nuance. So do we really need a computer to do the assessing for us?
I think there are two main benefits to AI here: (1) with innumerable data points, the algorithm may be able to identify a set of performance metrics that we haven’t thought of before. (2) computers don’t care about how last week’s c/s went, they focus on the data in front of them
So, what if we could use machine learning to develop new assessments and to refine current ones? Can we use machine learning to filter out the task irrelevant bias that affects how trainees are scored?
Will machine learning become a program evaluation tool that helps us explore the validity of our assessments as we strive to improve equity and counteract bias? What happens when we add computer-based “thinking” to the mix? It will be interesting, no doubt.