Thread by @dhadfieldmenell, Thread: One of the biggest issues in AI today is a lack [...]

Thread: One of the biggest issues in AI today is a lack of care in choosing supervision signals for learning. This is one of the more extreme examples but I think it highlights something fundamentally wrong in the way we frame AI problems. https://twitter.com/katyanna_q/status/1278410099711569923

https://twitter.com/katyanna_q/status/1278410099711569923

In this case, researchers wanted to use learning methods and they knew two things: 1) learning methods often need lots of data; and 2) it was prohibitively expensive to generate a labeled dataset with 80 million images.

As a result, they implemented an automatic labeling system that provided some truly objectionable labels (see the piece from the original tweet for details).

This dynamic is pervasive, although often far more subtle. E.g., consider bias that was observed in systems to evaluate resumes: https://www.reuters.com/article/us-amazon-com-jobs-automation-insight/amazon-scraps-secret-ai-recruiting-tool-that-showed-bias-against-women-idUSKCN1MK08G

Amazon scraps secret AI recruiting tool that showed bias against women

Amazon.com Inc's machine-learning specialists uncovered a big problem: their new recruiting engine did not like women.

https://www.reuters.com/article/us-amazon-com-jobs-automation-insight/amazon-scraps-secret-ai-recruiting-tool-that-showed-bias-against-women-idUSKCN1MK08G

The system was trained from historical data and learned to replicate previous hiring biases against women.

I understand this as the result of similar pressures: 1) they needed a large dataset of examples to learn effectively; and 2) manually labeling that dataset is expensive.

IMO the labels you want here are labels of “according to our current corporate goals, this is someone we should or should not interview.”

The labels you have are “in the past, these are the people we did or did not interview/hire.”

While that may seem “close enough” it clearly was not.

Obermeyer et al. observed a similar pattern when they identified how using cost (a convenient label for the healthcare industry) as a proxy for health lead to racial disparities. https://science.sciencemag.org/content/366/6464/447

Dissecting racial bias in an algorithm used to manage the health of populations

The U.S. health care system uses commercial algorithms to guide health decisions. Obermeyer et al. find evidence of racial bias in one widely used algorithm, such that Black patients assigned the...

https://science.sciencemag.org/content/366/6464/447

@math_rachel has a great thread that covers several examples of incorrect proxies in AI and their consequences: https://twitter.com/math_rachel/status/1176606580264951810

https://twitter.com/math_rachel/status/1176606580264951810

I was first exposed to this idea in a Economics course on incentives my senior year of undergrad when we covered what remains my favorite paper title of all time: “On the folly of rewarding A and hoping for B” ( https://www.ou.edu/russell/UGcomp/Kerr.pdf)

It documents a series of case studies from hospitals, schools, militaries and other organizations where “reward systems... are fouled up in that the types of behavior rewarded are those which the rewarder is trying to discourage.”

Robert Gibbons’ lecture note is a great technical introduction to the space: http://web.mit.edu/rgibbons/www/903%20LN%201%20S10.pdf

In short, objective misalignment in human incentives systems is pervasive. We should expect (and have indeed observed) the same problems in artificial incentive systems.

As we move toward more automatic labeling and supervision of AI systems and systems get better at optimizing the metrics we give them, I worry that AI designers are getting further and further removed from the goals we give our systems.

Making progress on this will require, at its core, a shift in the way we frame AI problems: AI systems need to be designed and built with the recognition that the metrics and objectives we design are _always_ misspecified in some way.

This is a cultural shift with technological consequences. Labels and proxies are almost always a *subjective* expression of desired system behavior. In contrast, students of ML/AI are trained to think of labels as *objective* properties of the world/problem framing.

Making this shift will help reduce harms from AI systems because it emphasizes the responsibility designers have over system behavior.

For example, researchers often talk about a trade-off between performance and fairness. To my eye this is a false choice caused by 1) a misaligned performance metric; and 2) a problem framing that treats that metric as the ground truth definition of performance.

Beyond that, this shift will help AI systems navigate the jump from laboratories to industry more readily in general. @sh_reya’s recent reflections on dataset curation in the context of gaps between research and application is highly relevant here: https://www.shreya-shankar.com/making-ml-work/

Reflecting on a year of making machine learning actually useful

For those of you who don’t know my story, I’ll give you the short version: I did machine learning research for two years, decided not to get a PhD at the time…

https://www.shreya-shankar.com/making-ml-work/

That is all for now. Thank you for reading what became a much longer thread than I intended.

I’m including a couple relevant references to my work below for those who are interested.

In “Inverse Reward Design,” @SmithaMilli, @ancadianadragan, @pabbeel, Stuart Russell and I explore the implications of treating a proxy objective as information about an unobserved goal.

https://arxiv.org/abs/1711.02827

In “Incomplete Contracting and AI Alignment,” @ghadfield and I explore the connections between the substantial economics literature on the problem of incomplete incentives in contracting and the analogous problem of misspecified objectives in AI/ML

https://arxiv.org/abs/1804.04268

I summarized some of these ideas in a blog post from 2017.

Reading back over it, I think it over-emphasizes the technical framing of the issue at times, but it still captures many of the important ideas. https://bair.berkeley.edu/blog/2017/08/17/cooperatively-learning-human-values/

Cooperatively Learning Human Values

The BAIR Blog

https://bair.berkeley.edu/blog/2017/08/17/cooperatively-learning-human-values/

Latest Threads Unrolled: