Inspired by Turing's game and Winograd's SHRDLU, a team of rsearchers @DeepMind worked for a year on a learning agent that could interact with a 3D simulated world and a human operator - with both agent and human generating and responding to language
Creating an interactive grounded language agent is one of the classic problems of AI, and it's really hard!
As anyone who has worked on dialogue systems knows, there is no score to optimise and no clear definition of success 2/N
As anyone who has worked on dialogue systems knows, there is no score to optimise and no clear definition of success 2/N
One of the hardest learning challenges turned out to be learning a single model to both move and generate language. Even with loads of data to imitate (and we collected a lot!) this is a really hard credit assignment problem. 3/N
Evaluation is also a huge challenge. Unless dealing with a few special cases, there isn't in general a way to automatically check whether natural instructions have been satisfied or questions answered.
Instead, we trained reward models to infer success or failure from data 4/N
Instead, we trained reward models to infer success or failure from data 4/N
We also used these reward models as a signal to improve the policies that our agent learns by imitation : fine tuning them with RL using these reward models generally improves performance 5/N
In case this thread is a bit dry, here's a video of the agent learning bit by bit.
It's interpretation of commands slowly gets better, and the words it produces become more and more relevant to its immediate context as it learns 6/N
It's interpretation of commands slowly gets better, and the words it produces become more and more relevant to its immediate context as it learns 6/N
a final detail: semi supervised learning really helped. It seems to help the model get a sense of objects, how they can be interacted with, and how they bind to words.
From another perspective, it helps the model overcome the hard credit assignment problem I mentioned above 7/N
From another perspective, it helps the model overcome the hard credit assignment problem I mentioned above 7/N
Note also: the agents knowledge seems relatively general. It can interpret commands or questions it has never seen before (given experience of the words in question)
Whether it meets the standards of systematicity reqd by @LakeBrenden and others remains to be seen! 8/N
Whether it meets the standards of systematicity reqd by @LakeBrenden and others remains to be seen! 8/N
There is a long way to go with this. The agent is far from perfect. Eg it breaks after 4-5 instructions or questions. It can't deal w co-ref between utterances. We hope the write up provides insight for others working on these or similar probs
Paper
http://bit.ly/2IDNHK6
Paper
http://bit.ly/2IDNHK6
And I was only a small part of a big team that made this happen. Not all are on here but @santoroAI @ArthurBrussee @SavinovNikolay @_timharley are, and I hope they tell me who else to tag!