Here's my take on the government A level and GCSE fiasco. I should disclose that (a) I have an interest in this, with two children who have just got GCSE results and (b) I'm by no means an expert in this specific field, but with experience in other uses of statistical modelling.
1/ In chronological order, the decision to hastily and immediately abandon the exams was in retrospect an error. It is also curious in a country which compared to it's peers tests children more frequently, and attaches more importance to those tests, than most of it's peers.
2/ Clearly little thought at the time was given to the consequences, and that any replacement grading system would involve painful compromises. My hunch is that too much confidence was placed in the ability of a model to adjust grades accordingly.
3/ My second hunch is that much of this confidence came from the fact that Dominic Cummings is someone who is in love with the idea of applying scientific methodology and using data, without the experience of applying these techniques in the messy real world.
4/ Like a lot of people, having had some success with this ideology (cf Brexit), this has internally validated his approach and thus he will continue to believe that data and science can achieve the impossible.
5/ The impossible here is avoiding two twin dilemmas: grade inflation, and unfairness.
6/ Grade inflation is problematic for a number of reasons, but most aposite right now if the inflation isn't accounted for in university or sixth form college offers, then the process of rationing of places by grades will fail.
7/ A Russel group uni might find that the 20% of offers it makes, with the assumption that students do not get these grades, have to be honoured. It now has too many students and not enough places, though a dearth of overseas students (thanks to Brexit/COVID) may create capacity.
8/ Then a second tier uni which would normally pick up that 20% as second choices or through clearing will not have sufficient students.
9/ Unfairness can be defined simply that students do not get the grade they would have been expected to get had the exams taken place.
10/ In a fantasy world students can be allocated the grades they deserve (no unfairness) by a process that also ensures that the outcome across the population meets the desired distribution (no grade inflation).
11/ Sadly it's unlikely that a teacher assessment will produce zero grade inflation, or even zero unfairness. You can (as the government did) exhort teachers and schools to avoid grade inflation, but there is too much motivation for teachers to inflate.
12/ This is especially true if they think that everyone else is inflating grades (insert game theory reference here). Except in the unlikely event that all teachers inflate equally, this process will also lead to unfairness.
13/ But there is a perception that teacher allocated grades are fairer than model grades. Explanations include that the models are poor (more in a moment), because 'teachers know young people* better', because teachers have credibility, or cynically as teacher grades are higher.
*/ Aside: when did teenagers become 'young people'? At what point does a child become a 'young person'?
14/ On to the model. There is a list of issues with any statistical model, which the grading models used by OFQUAL seem to fit** very nicely.
** obligatory statistical joke
15/ (i) Over simplistic, (ii) Insufficiently transparent, (iii) Outliers, (iv) Small sample sizes, (v) biases
16/ All models lack sufficient depth and complexity to precisely model the real world. This may not be a bad thing, and indeed is a good thing, because simple models are more intuitive and may perform better. However it is a weapon that can be used against any model.
17/ We do not actually know how complex this model is, because there is a complete lack of transparency about it. As a rule when a model is used by a public body it should be fully disclosed. Imagine if the rules for benefit eligibility were secret.
18/ (it would be nice if private models were also released; eg credit scoring models have a huge effect on peoples lives but are pretty opaque)
19/ Outliers are a problem for any model. But an outlier here is an 18 year old appearing on radio or TV in tears because she has been downgraded and can't get into her chosen university. I don't think this was properly appreciated by the people constructing the model.
20/ Small sample sizes are a problem for models. OFQUAL went down the route of modelling each school. This is seen as fairer (imagine the furore if the dependent variables were income and race, which do predict grade outcomes), but isn't (many schools have income/race biases).
21/ This means that schools with small cohorts can't use the model, and keep their teacher grades. The bias towards independent schools this produces should perhaps have been obvious at the time.
22/ There were other biases in the model adjusted results. It isn't clear if these were because (say) teachers in Northern england were more optimistic than others, and thus saw larger downgrades; if the model was flawed, or if the model reproduced biases in the data.
23/ Ultimately, it was going to be a very hard sell to get acceptance for the model, but errors in the modelling process made it basically impossible. It's one thing to use a model to understand the covariates of grade outcomes, another to predict grades using those covariates.
24/ We can think about alternatives; for example telling teachers to rank pupils and not forecast grades, essentially relying only on the model. Still there would be comparisions with mock results and recent reports.
25/ We can imagine a world in which the model was very simple, given to head teachers to run, and produced a grade distribution based on teacher rankings and the previous results for the school. This would still be unfair to small cohorts, or to schools showing improvement.
26/ We can pretend that pure teacher assesments were used from the very start. There would be rampant grade inflation (as now), there would probably be some unfairness (as now), but from a PR perspective this would have been much, much better.
27/ It would be pointless to point out that this issue has been managed appallingly: The culture of secretiveness, 'we know best', 'data is cool', a fig leaf of consultancy (I actually replied to the consulting document), a refusal to be accountable, followed by a U-turn.
28/ Yes, these are the hallmarks of anything that this government has done.
29/ 'All models are wrong, but some are useful'. The OFQUAL model could have been useful, or at least produced a better outcome, but one thing is sure: This government is both wrong and useless.
You can follow @investingidiocy.
Tip: mention @twtextapp on a Twitter thread with the keyword “unroll” to get a link to it.

Latest Threads Unrolled:

By continuing to use the site, you are consenting to the use of cookies as explained in our Cookie Policy to improve your experience.