2/ Computer systems should adapt to the ways people - all people - use language. West Africans have spoken their languages for thousands of yrs. creating rich oral history traditions. Computers could easily support this oral tradition. 💻💬
3/ Speech-based technology does exist, however popular products do not “speak” any of the 2000 languages & dialects spoken by Africans. Apple’s Siri, Google Assistant, & Amazon’s Alexa collectively service 0 African languages.📵
4/Because illiteracy tends to correlate with lack of schooling & thus, the inability to speak a common world language, speech technology is not available to those who need it the most.For them, such technology could bridge the gap between illiteracy and digital contributions🗣️📳
5/ Why the gap? Languages spoken by smaller populations are often casualties of commercial prioritization.💰Furthermore, groups with power over technological goods tend to speak the same few languages, making it easy to insufficiently consider those with different backgrounds.
6/ Speakers of languages such as those widely spoken in West Africa are grossly underrepresented in the research labs, companies and universities that have historically developed speech-recognition technologies.💻🌍
7/ ⬆️All exacerbate another critical challenge: lack of data. Languages spoken by illiterate people who would most benefit from voice recognition tech tend to fall in the “low-resource" category, which, in contrast to “high-resource” languages, have few available datasets.
8/ Moussa, Chris, and I are tackling this problem. We developed the first speech recognition models for Maninka, Pular Susu, languages spoken by a combined 10 million people in seven countries with up to 68 percent illiteracy. Full paper 📜⬇️
https://mdoumbouya.github.io/nicolingua.pdf 
9/ We leveraged speech data that is abundantly available, even in low-resource languages: radio broadcasting archives. 📻 We are releasing our data sets, code and models to the research community in hopes of catalyzing further efforts in these areas. https://github.com/mdoumbouya/nicolingua
10/ Computers are not yet sufficiently evolved to be useful in some societies. Our friends should not have to read and write a common language to contribute to scientific research, much less to merely interact with their smartphones. 🗣️📲
11/ Yes, it is challenging to create computers that understand the subtleties of oral communication in thousands of languages rich in oral features such as tone and other high-level semantics. But where researchers turn their attention, progress can be made.🔍
12/ Innovation, access and safety demand that technology speak all of the world’s languages.

@StanfordEng @Stanford @PeaceCorps @WHOSTP
@FSIStanford @stanfordio @DigCivSoc @StanfordHAI @StanfordCyber @StanfordPACS @ai4allorg @stanfordnlp @StanfordAILab
You can follow @LisaEinstein.
Tip: mention @twtextapp on a Twitter thread with the keyword “unroll” to get a link to it.

Latest Threads Unrolled:

By continuing to use the site, you are consenting to the use of cookies as explained in our Cookie Policy to improve your experience.