A thing that came up at my day-job that is driving me a little crazy: let's say I want to discover the highest-scoring sequence of moves in a deterministic one-player video game. There must be something better to do than a brute trajectory tree search?
The problem of discovering a high scoring sequence of moves appears in the RL literature as a kind of degenerate form of RL problem -- a warning about what an RL problem turns into if you don't randomize the environment or distinguish train environment from test environment
And there's a simple trajectory tree search algorithm called The Brute that's used to illustrate the point by outperforming most RL at discovering a high scoring sequence of moves in deterministic one player games
The thing I'm trying to do at my day-job is formally very similar to this 'degenerate' form of RL problem. This makes the literature search kind of a nightmare, because usually high-scoring-sequence-discovery is exactly what you don't want to evaluate your agent for
It's known that RL agents can sometimes far outperform brute trajectory tree search even for high-scoring-sequence-discovery -- The Brute only outperforms RL in about half of Atari games tested -- but not really how to optimize RL for this
I think there's an interesting theoretical question here: when is the best way to do search in a determinatic environment to learn it, and when shouldn't you bother
Suppose there's a Mario Maker level that's a sequence of a thousand precise platform jumps, the platforms spaced a little differently each time