Thread by @jachiam0, Hot take: deep RL research has stagnated because conferences have created bad [...]

Hot take: deep RL research has stagnated because conferences have created bad incentives, rewarding researchers for vacuous claims of novelty, tenuous-at-best theoretical connections, or SOTA, while punishing boring analysis of the empirical tricks that actually make things work.

Example: Consider this paper on dense architectures for RL (D2RL) ( https://arxiv.org/abs/2010.09163 ), which came out this year. What it shows is that if you hold an algorithm constant and vary the NN architecture a certain way, performance improves.

Amazingly this holds across several algorithms in a wide variety of tasks. The change in task performance from just the architecture is significant.

Why should we care about this? Because usually deep RL algorithms are evaluated and compared to each other without ablations on network architecture. Results are presented for an architecture tuned for the specific algorithm and tasks used in the experiment.

It's been shown before that varying architecture (tweaking away from the well-tuned island) could hurt algorithm performance. (Figure 2: https://arxiv.org/abs/1709.06560 ) This alone should have been enough for us to notice architecture analysis was being undervalued.

The D2RL paper makes the need to investigate this even more salient, by showing you can get a *specific* architecture variant to improve performance *across tasks and algorithms.* Again: amazing.

What does ICLR think of this paper? Three out of four reviewers voted to reject this work for insufficient novelty. An incredibly important empirical result may not be published because reviewers have failed to understand the needs of the field. https://openreview.net/forum?id=mYNfmvt8oSv

D2RL: Deep Dense Architectures in Reinforcement Learning

While improvements in deep learning architectures have played a crucial role in improving the state of supervised and unsupervised learning in computer vision and natural language processing...

https://openreview.net/forum?id=mYNfmvt8oSv

It's worth noting that this paper is arriving several years later than it could have, and the delay is significant. Years of algorithm comparisons are called into question when something like this comes up. Why wasn't this work done sooner? Because the field keeps punishing it.

This all makes me wonder if perhaps it's time for conferences to retire "technical novelty" as a standard. "Novelty of understanding" should be sufficient even if we arrive there using well-known tools.

P.S. If you have other examples of this kind of phenomenon playing out---good RL papers rejected for bad reasons---sound off, I'd like to hear about them.

You can follow @jachiam0.

Tip: mention @twtextapp on a Twitter thread with the keyword “unroll” to get a link to it.

Latest Threads Unrolled: