I'm grateful to @ResistanceAI for the opportunity to present some notes from a work in progress in which I consider the question: how does machine translation shift power? Some thoughts follow.
The practice of translation between human languages has often been shaped by power asymmetries -- for example, in the context of developing 'educational' materials that undergirded colonializing efforts.
It is worth examining how narratives of machine translation as a 'social good' that equalizes access to information implicitly valorize the knowledge produced by particular linguistic communities, as well as what that says about who gets to create 'vital' information.
It is also worth examining how machine translation has been used for surveillance. Consider the case of 'extreme vetting' of social media for foreign entrants into the United States, which involves using tools like Google Translate to translate non-English social media posts.
Considering that MT is prone to errors, might this practice fit (somewhat disturbingly) alongside other machine learning tools that attempt to render legible that which has been obscured or distorted (think PULSE)?
Lastly, as current paradigms in neural MT demand large quantities of data and computing resources, to quote Claire Larsonneur: this 'shifts economic activity to a handful of
tech giants as providers of translation' https://spheres-journal.org/contribution/the-disruptions-of-neural-machine-translation/
tech giants as providers of translation' https://spheres-journal.org/contribution/the-disruptions-of-neural-machine-translation/
The data-driven, web-scraped paradigm, together with the state-of-the-art in NMT, also complicates questions of language ownership. Is there a lower bound on the number of sentences of a language sufficient for a morphosyntactic representation?
What recourse does a linguistic community have if they do not wish to entrust software companies with the development of tools in their language? Recall the Mapuches' unsuccessful case against Microsoft for the development of a Mapudungun version of Windows without their consent.
As @kerim writes, "Languages aren’t stolen the way
property is stolen. Rather, people are denied the sovereignty necessary to shape their own cultural and educational practices."
property is stolen. Rather, people are denied the sovereignty necessary to shape their own cultural and educational practices."
On that note, I'm excited by participatory projects like @MasakhaneNLP that include native & heritage speakers of African languages in the development of NLP tools and software, involving the most impacted stakeholders in the research direction and data curation process.
Closing thoughts: this isn't just about machine translation, of course, but rather, the social and political contexts that have shaped its development and use. And of course, MT is useful! I use it nearly every day.
But because it is increasingly used in high-stakes scenarios (read @danyoel on this), we need to have a clear understanding of its limitations, alongside critical examinations of the theoretical assumptions and prevailing technological paradigms that shape machine translation.
I'm eager for feedback (and, if you're interested, collaborators?) as I try to push this work forward. I'm indebted to @annaeveryday for the space & support to develop these ideas and for feedback on an earlier, shittier draft of the work. If it still sucks that's on me, not her!
Also gotta thank folks @reboot_hq for feedback on pieces of this, too (hello @jessicadai_ !); @_julianmichael_ for listening to me complain about this paper for a full year; @alexhanna for feedback that'll go into a future version

Thanks! Oh and here is where the paper lives, for now: https://sites.google.com/view/resistance-ai-neurips-20/accepted-papers-and-media?authuser=0