Thread: One of the biggest issues in AI today is a lack of care in choosing supervision signals for learning. This is one of the more extreme examples but I think it highlights something fundamentally wrong in the way we frame AI problems. https://twitter.com/katyanna_q/status/1278410099711569923
In this case, researchers wanted to use learning methods and they knew two things: 1) learning methods often need lots of data; and 2) it was prohibitively expensive to generate a labeled dataset with 80 million images.
As a result, they implemented an automatic labeling system that provided some truly objectionable labels (see the piece from the original tweet for details).
The system was trained from historical data and learned to replicate previous hiring biases against women.

I understand this as the result of similar pressures: 1) they needed a large dataset of examples to learn effectively; and 2) manually labeling that dataset is expensive.
IMO the labels you want here are labels of “according to our current corporate goals, this is someone we should or should not interview.”

The labels you have are “in the past, these are the people we did or did not interview/hire.”
@math_rachel has a great thread that covers several examples of incorrect proxies in AI and their consequences: https://twitter.com/math_rachel/status/1176606580264951810
I was first exposed to this idea in a Economics course on incentives my senior year of undergrad when we covered what remains my favorite paper title of all time: “On the folly of rewarding A and hoping for B” ( https://www.ou.edu/russell/UGcomp/Kerr.pdf)
It documents a series of case studies from hospitals, schools, militaries and other organizations where “reward systems... are fouled up in that the types of behavior rewarded are those which the rewarder is trying to discourage.”
Robert Gibbons’ lecture note is a great technical introduction to the space: http://web.mit.edu/rgibbons/www/903%20LN%201%20S10.pdf

In short, objective misalignment in human incentives systems is pervasive. We should expect (and have indeed observed) the same problems in artificial incentive systems.
As we move toward more automatic labeling and supervision of AI systems and systems get better at optimizing the metrics we give them, I worry that AI designers are getting further and further removed from the goals we give our systems.
Making progress on this will require, at its core, a shift in the way we frame AI problems: AI systems need to be designed and built with the recognition that the metrics and objectives we design are _always_ misspecified in some way.
This is a cultural shift with technological consequences. Labels and proxies are almost always a *subjective* expression of desired system behavior. In contrast, students of ML/AI are trained to think of labels as *objective* properties of the world/problem framing.
Making this shift will help reduce harms from AI systems because it emphasizes the responsibility designers have over system behavior.
For example, researchers often talk about a trade-off between performance and fairness. To my eye this is a false choice caused by 1) a misaligned performance metric; and 2) a problem framing that treats that metric as the ground truth definition of performance.
Beyond that, this shift will help AI systems navigate the jump from laboratories to industry more readily in general. @sh_reya’s recent reflections on dataset curation in the context of gaps between research and application is highly relevant here: https://www.shreya-shankar.com/making-ml-work/ 
That is all for now. Thank you for reading what became a much longer thread than I intended.

I’m including a couple relevant references to my work below for those who are interested.
In “Inverse Reward Design,” @SmithaMilli, @ancadianadragan, @pabbeel, Stuart Russell and I explore the implications of treating a proxy objective as information about an unobserved goal.

https://arxiv.org/abs/1711.02827 
In “Incomplete Contracting and AI Alignment,” @ghadfield and I explore the connections between the substantial economics literature on the problem of incomplete incentives in contracting and the analogous problem of misspecified objectives in AI/ML

https://arxiv.org/abs/1804.04268 
I summarized some of these ideas in a blog post from 2017.

Reading back over it, I think it over-emphasizes the technical framing of the issue at times, but it still captures many of the important ideas. https://bair.berkeley.edu/blog/2017/08/17/cooperatively-learning-human-values/
You can follow @dhadfieldmenell.
Tip: mention @twtextapp on a Twitter thread with the keyword “unroll” to get a link to it.

Latest Threads Unrolled:

By continuing to use the site, you are consenting to the use of cookies as explained in our Cookie Policy to improve your experience.