Thread by @rebeccakreitzer, Standard evaluations of teaching (SETs) are problematic bc of known measurement & [...]

Standard evaluations of teaching (SETs) are problematic bc of known measurement & equity bias. In this new pub, @jenniesweetcush & I provide a comprehensive but nuanced overview of the literature. We make recommendations on how to use SETs responsibly. https://link.springer.com/epdf/10.1007/s10805-021-09400-w

Our goal with this paper is to provide a careful look at the nuances of SET biases. It's not as simple or straightforward as "SETs are biased against women and POC." The dynamics of SET biases are conditional and do, in certain conditions, advantage women and POC.

Major findings: There is plenty of evidence of both measurement and equity biases. As a result, it’s absolutely critical that universities and colleges carefully evaluate how these are administered, and the role they play in personnel decisions.

SETs are a MAJOR topic on #academictwitter. We know this issue affects people on a personal & professional level. We hope our work empowers the advocacy of those pushing to eliminate SETs *and* provides practical recommendations for the continued use of SETs in a less bad way.

Before I get to details of our findings: There are some meta-analyses of SET biases. But they don't capture complexity of SET bias. In particular, meta-analyses require comparison of similar or identical variables & models across studies. 3 types of data are generally eliminated.

1) Quant data from sources like RMP, non-SET surveys, questions from SETs that have unique wording, 2) experimental designs, 3) qualitative, open-answer sections of SETs, where sexist, racist or homophobic comments emerge.

As a result, we take a different approach. We read, and read, and categorized, and read again the many articles on SET biases. Our goal with this paper is to explain the complexity of evidence on SET biases & provide pragmatic recommendations for how to use this imperfect tool.

Measurement bias occurs when variables unrelated to teaching influence SETs. Examples of course characteristics that bias evals: class time, class size, lecture format, difficulty, discipline. Examples of student characteristics: interest in material, previous coursework.

Teaching evals are weakly correlated (if at all!) with learning. SETs identify the wrong instructor as the superior teacher 37% of the time. For these reasons alone, SETs should not be used as sole indicator of teaching performance for personnel purposes. https://www.tandfonline.com/doi/abs/10.1080/02602938.2020.1724875?journalCode=caeh20

Unbiased, reliable, and valid student evaluations can still be unfair

(2020). Unbiased, reliable, and valid student evaluations can still be unfair. Assessment & Evaluation in Higher Education: Vol. 45, No. 8, pp. 1106-1120.

https://www.tandfonline.com/doi/abs/10.1080/02602938.2020.1724875?journalCode=caeh20

Equity bias occurs when variables outside instructor’s control influence SETs: instructor’s gender, race, ethnicity, accent, sexual orientation, disability. Evidence on equity bias is mixed & clearest in qual comments. It’s clear equity bias, in particular, is highly conditional.

We go in to detail about evidence for equity bias. Bc this is the bulk of the online discussion of SETs, I want to share with twitter some of the nuance. Equity bias is more prominent in some disciplines; students evaluate profs through a gender lens, evidence of affinity effect.

Most universities/colleges are unlikely to eliminate use of SETs anytime soon, in part bc of inertia & in part bc there aren’t clear alternatives to SETs that don't also fall prone to measurement or equity bias. Here are some practical rec's on how to use SETs in a less bad way.

1) Contextualize evals as perceptions of student learning - not a measure of actual teaching. When properly contextualized as feedback on experience (NOT teaching), they can provide some useful feedback for faculty and administrators about students' satisfaction with the course.

2) Increase validity of SETs by increasing response rate. Small samples likely to be unrepresentative. Tips to improve RR: give time to complete in class (faculty shld leave room), explain how SETs are used by admin (most students don’t know this!), how faculty use SET feedback.

An aside: There is a lot of online chatter about whether telling students about biases in particular mitigates bias or inflames a backlash. Some experimental evidence suggests mitigation but others don’t. Many faculty have anecdotal evidence of backlash. We need more research.

3) Interpret results with caution. Ideally, evals should compare faculty member’s trajectory over time, ideally, within a course. Bc equity bias manifests through lower evals for astereotypic instructors, comparisons across faculty further disadvantage marginalized faculty.

Most faculty get mostly positive reviews, which don’t follow a normal distribution (they have a negative skew). Mean of skewed distribution more influenced by outliers. Admin should look at overall distribution, look at median or modal responses, not means.

Unfortunately, reporting means seems to be most common now. But this is a relatively easy thing to change! The reasoning of using median or modal values rather than mean seems to resonate with many people. It’s a pretty basic statistical argument here.

Report ratings from multiple questions, not a single global question about overall teaching, to reduce the effect of measurement error.

4) Restrict or eliminate use of qualitative comments. Across all studies, clearest example of equity bias is in comment section. These can be brutal and extremely disempowering for faculty who get sexist, racist, homophobic comments.

Comments are hard to aggregate, small sample, frequently have contradictory feedback (validity issue). Further, even careful reading of comments is prone to novelty bias (likely to remember uncommon findings) and negativity bias (more influenced by negative information than pos).

I know personally how negative comments burn and linger, for years. They can make you more insecure and nervous about teaching. Particularly nasty comments can cause genuine mental anguish.

5) Admin should not rely on SETs as sole method of assessing teaching. It’s true that other forms of assessment also prone to bias, ie classroom observations. But these assessments aren’t systematically biased in same way. Better to use multiple imperfect tools than just one.

6) We need more research on interventions to reduce bias. What little research exists yields some promising leads. Reducing size of scale can mitigate bias, RCTs that make students aware of biases may mitigate gender gap – but the evidence here is contradictory.

I’ve published a fair amount of research as an AP but this pub is the one that I'm most eager to share with the world. The stakes here are so high - SETs are ubiquitous – but there are actual real steps we can make to make them less bad.

I’m happy to answer any questions you all have about this research. (RIP my mentions

) Apologies for any typos or lack of clarity above. This is the longest thread I've written!

I'll DM anyone a pdf who wants to read. My DMs are open! https://twitter.com/ProfPButton/status/1359156541035839497?s=20

https://twitter.com/ProfPButton/status/1359156541035839497?s=20

Here is the whole thread in one place: https://threadreaderapp.com/thread/1359155351707414528.html

Thread by @rebeccakreitzer on Thread Reader App

Thread by @rebeccakreitzer: Standard evaluations of teaching (SETs) are problematic bc of known measurement & equity bias. In this new pub, @jenniesweetcush & I provide a comprehensive but nuanced...

https://threadreaderapp.com/thread/1359155351707414528.html

Latest Threads Unrolled: