I'm feeling happy and grateful about my first publication ever, with a bunch of wonderful collaborators and my stellar supervisor @maite_taboada 💓
#PLOSONE: The Gender Gap Tracker: Using Natural Language Processing to measure gender bias in media https://dx.plos.org/10.1371/journal.pone.0245533
1/n https://twitter.com/maite_taboada/status/1355241035505405953
Maite has highlighted our key results in her tweet thread that I quote-tweeted, and I wanted to add to that with a focus on gender and ethics. This project sought to quantify gender bias in the media by counting the proportion of people quoted in news text who are women. 2/n
To do so, we developed a system that analyzes articles published online by certain English-language Canadian news outlets, using parts we built in combination with off-the-shelf NLP tools. 3/n
If the "gender prediction" part of this is making your ethical spidey sense tingle, you are in good company! We go into some detail about this in our paper and I want to highlight some of the important stuff there + direct you to good resources! 4/n
Although we have three categories of gender—male, female and other—the 'other' category lumps together cases where the speaker's gender is unknown as well as where the speaker's gender is known to fall outside the binary. 5/n
Also, our system depends on databases that encode binary gender (which erases nonbinary people!) and on other systems that are known to have significant performance gaps for resolving instances of singular they and for English neopronouns (including my own - xe/xyr) 6/n
On using [first name] - [assigned sex at birth] data: "Clearly, this is a problematic practice, as it assumes that gender is binary, that sex and gender have perfect correlation, and that people’s names are accurate predictors of their sex or gender." 7/n
For a significant chunk of this research, we used an entity-based approach, where a label is associated with a particular real-life individual and their public gender identity (as defined by online resources). 8/n
This is _somewhat less problematic_, but still suffers from the dependency on "online resources" that may not be up-to-date and may still encode gender as binary. We do have some manual overrides for (famous) people whom we know are systematically misgendered by these systems 9/n
This all seems grim, but counterpoint: all previous manual attempts at quantifying the gender gap in media quotes have required orders of magnitude more time and labour, while still suffering from many of the same assumptions, made by annotators instead of our system. 10/n
E.g., human annotators working on this task would still be _assigning_ gender to people by reading an article, i.e., they'd still correlate certain names with a gender, assuming pronouns unambiguously tell you someone's gender, maybe conceive of gender as being binary, etc. 11/n
Gender is fluid and personal, and therefore "gender recognition" as a concept is impossible to do. This is important! At the same time, equitable representation in the media is also important. So how would you design a perfect experiment in a perfect world? Glad you asked! 12/n
The Correct™ way to do this would be to use self-reported gender (some caveats below)! Unfortunately news outlets don't currently widely collect this data from speakers :( So @ news outlets, please do more of this! 13/n
The caveats are mostly things I've mentioned above, i.e., don't assume binary gender, don't assume gender correlates directly with pronouns and that coreference analysis is going to be easy, don't assume that someone's pronouns tell you their gender, etc. 14/n
BUT ALSO, making a 3-way radio button that says male-female-nonbinary or male-female-other is Incorrect™! If you are considering research/surveys along these lines, I want to point you to a couple of resources and people. 15/n
First, please read this article titled How To Do Better with Gender on Surveys: A Guide for HCI Researchers, by @katta_spiel, @oliverhaimson and @dlottridge 16/n
https://interactions.acm.org/archive/view/july-august-2019/how-to-do-better-with-gender-on-surveys
Next, please watch this excellent (captioned) talk by @kirbyconrod titled "How to do things with gender"
17/n
Now read through @queerterpreter's thread on good and bad ways to ask for gender data in surveys (all images have alt text, Ártemis is a linguist+academic who studies grammatical gender and they're trans and nonbinary themself!) 18/n https://twitter.com/queerterpreter/status/1328843348707315714
If you're still hungry for more, read my note on my website where I include links to more resources on the subject, including of course the Cao-DaumĂŠ paper on inclusive coreference resolution 19/n
https://vasundharagautam.com/work.html#gender-gap-tracker
I hope you enjoy our paper and that you got something out of this thread, and we're happy to field questions about the research! :) 20/20
You can follow @VasundharaNLP.
Tip: mention @twtextapp on a Twitter thread with the keyword “unroll” to get a link to it.

Latest Threads Unrolled:

By continuing to use the site, you are consenting to the use of cookies as explained in our Cookie Policy to improve your experience.