agreed with everything in this thread!
my supplementary hot take: deep RL is built upon a number of fundamentally broken ideas and we need to take a step back and question the basic principles. we simply don't have good RL algorithms to build on yet. https://twitter.com/jachiam0/status/1328740358092591104
my supplementary hot take: deep RL is built upon a number of fundamentally broken ideas and we need to take a step back and question the basic principles. we simply don't have good RL algorithms to build on yet. https://twitter.com/jachiam0/status/1328740358092591104
on the plus side, deep RL has famously demonstrated that it *is* possible to combine basic ideas from RL and neural nets to get non-trivial results on challenging sequential decision-making problems...
... but the fact that these only work for a handful of cherry-picked random seeds suggests that these methods work by accident more than as a result of good design. in every principled field of study, "it works" should mean "it works with probability close to 1".
luckily, RL theory is enjoying a renaissance as we speak, producing a number of innovative ideas that will soon start to influence the practice of RL. i am optimistic that we can leave the harmful trends behind and start building reliable algorithms on better foundations.