Thread by @peligrietzer, A thing that came up at my day-job that is driving me [...]

Peli Grietzer

peligrietzer

A thing that came up at my day-job that is driving me a little crazy: let's say I want to discover the highest-scoring sequence of moves in a deterministic one-player video game. There must be something better to do than a brute trajectory tree search?

The problem of discovering a high scoring sequence of moves appears in the RL literature as a kind of degenerate form of RL problem -- a warning about what an RL problem turns into if you don't randomize the environment or distinguish train environment from test environment

And there's a simple trajectory tree search algorithm called The Brute that's used to illustrate the point by outperforming most RL at discovering a high scoring sequence of moves in deterministic one player games

The thing I'm trying to do at my day-job is formally very similar to this 'degenerate' form of RL problem. This makes the literature search kind of a nightmare, because usually high-scoring-sequence-discovery is exactly what you don't want to evaluate your agent for

It's known that RL agents can sometimes far outperform brute trajectory tree search even for high-scoring-sequence-discovery -- The Brute only outperforms RL in about half of Atari games tested -- but not really how to optimize RL for this

I think there's an interesting theoretical question here: when is the best way to do search in a determinatic environment to learn it, and when shouldn't you bother

Suppose there's a Mario Maker level that's a sequence of a thousand precise platform jumps, the platforms spaced a little differently each time

You can follow @peligrietzer.

Tip: mention @twtextapp on a Twitter thread with the keyword “unroll” to get a link to it.

Latest Threads Unrolled: