Those who argue that curation is the answer to "more ethical AI/ML/LMs" come across as intellectually naive. Archivists have been grappling with this issue, literally, for millenia. They'd be well advised to consult some of the literature from that field.
Curation is not a silver bullet. Curation itself is an expression of power, subjected to the same power dynamics as broader society. If anything, curation might amplify inequalities because curation is expensive and only large powerful institutions can afford to do it.
Thus, suggestions like researchers should "curate training datasets through a thoughtful process of deciding what to put in" are empty platitudes that are difficult to operationalize. Whose thought? Whose values? Anglo-American centric?
Let's be honest: curation == bias. Proponents are essentially advocating for a "corrective bias" that addresses inequalities. They surely must admit the legitimacy of multiple perspectives? But this simply reframes the problem to one of "is my corrective bias better than yours?"
Here's a concrete example: let's say we build separate LMs, from an African-American perspective and an Asian-American perspective. What if these conflict, as is the case in the 1992 LA Riots? https://en.wikipedia.org/wiki/1992_Los_Angeles_riots Suppose the application is search: "whose" results get shown?
Any resolution would necessarily involve another exercise in power, so it's not clear we've made any progress. tl;dr -curation may sound appealing, but it can't be the answer.
You can follow @lintool.
Tip: mention @twtextapp on a Twitter thread with the keyword “unroll” to get a link to it.

Latest Threads Unrolled:

By continuing to use the site, you are consenting to the use of cookies as explained in our Cookie Policy to improve your experience.