I don't like the skateboarding analogy to Hamiltonian Monte Carlo because it's too easy to misinterpret what the algorithm actually does.
My brain is running on fumes, but let's use those vapors to try to make the analogy a little bit less bad. https://twitter.com/ChelseaParlett/status/1351663338338488320
My brain is running on fumes, but let's use those vapors to try to make the analogy a little bit less bad. https://twitter.com/ChelseaParlett/status/1351663338338488320
Yes one can think of a posterior distribution as defining a mathematical skate park, where the valleys correspond to regions of high posterior probability and the peaks correspond to regions of low posterior probability.
When your posterior is strongly informed then the skate park looks like a nice symmetric bowl, but when your uncertainties are complex then the park will take on a much wilder shape.
Our goal is then to characterize all of the features of the skate park, but unlike real life we can't really step back away far enough to see the entire skate park all at once. In fact we don't have any extended vision at all.
All we can do is query what the shape of the park is at a single point!
This is one of the important places where the usual analogy breaks down. Yes there's a virtual park but we don't experience it in the same way that we experience a real skate park.
This is one of the important places where the usual analogy breaks down. Yes there's a virtual park but we don't experience it in the same way that we experience a real skate park.
Incidentally my autocorrect replaced a misspelt "skate park" with saké park and I want to go to there.
So we want to characterize the features of our skate park but we can't see beyond the shape of the park at our current location. What can we do? Let gravity do the work for us!
*Well kind of sort of gravity. Really we're relying on the local gradient of the park surface to tell us about how we should move from our current point, but that gradient and the force of gravity are basically the same.
One way to use gravity is to drop a marble and follow it.
Let's start with a rough marble with lots of friction. In this case the marble will fall to the lowest nearby point, which is exactly what the gradient descent algorithm will do.
Let's start with a rough marble with lots of friction. In this case the marble will fall to the lowest nearby point, which is exactly what the gradient descent algorithm will do.
Why a rough ball with friction? Well in order to be pulled by gravity our marble needs to have mass, but if it has mass and moves it gains momentum which the basic gradient descent algorithm does not have. In other words the gradient descent analogy is a bit of a stretch here.
Anyways, we drop this rough ball and it finds the lowest point. But neither that path or the lowest point are all that great characterizations of the entire skate park. We need a marble that doesn't just stop at the lowest point.
So we get rid of the friction by using a smooth marble that rolls perfectly. Yes the marble will roll towards the lowest point but as it does so it picks up momentum, and when it reaches the lowest point that momentum keeps it going beyond the nadir.
If fact if we don't just drop the ball but give it a little kick in a random direction then chances are that it will continue to bounce around the park in a chaotic trajectory. If we kick it again occasionally then we might even be able to explore the entire park.
Buuuuuut the marble is just a marble. It doesn't have any memory to tell us all about the place it went. Instead we have to follow it and see everything that it sees.
So we grab our board and put on our pads -- or don't if we're rebels -- and try to follow the marble as it bounces around the park. Emphasize on _try_.
See we're not necessarily the greatest skateboarders, so our pursuit trajectory is a little wobbly. Small wobbles are okay -- we can correct for them later -- but if the marble starts making turns that are too sharp for us to follow then we're going to wipe out.
Now imagine that for some odd reason we have a giant rocket attached to our back and when we wipe out that rocket has a tendency to turn on and send us flying in wild directions.
When we wipe out we not only completely loose track of the marble and where we were _supposed_ to be going but also end up flying around the park completely uninfluenced by gravity.
This is a divergence -- it's not a problem with the exact trajectory but out ability to follow that trajectory. The skate parks with wacky shapes generate trajectories that are harder to follow, but if we keep track of _where_ then we went learn about some of that wackiness.
To summarize:
Your posterior <-> Your skate park
Degenerate posteriors <-> Extreme skate parks
Gradient information <-> Pull of gravity
Gradient descent <-> Path of ball with infinite friction
Hamiltonian trajectory <-> Path of frictionless ball
Your posterior <-> Your skate park
Degenerate posteriors <-> Extreme skate parks
Gradient information <-> Pull of gravity
Gradient descent <-> Path of ball with infinite friction
Hamiltonian trajectory <-> Path of frictionless ball
Numerical Hamiltonian trajectory <-> Following a frictionless ball on a skateboard
Divergence <-> Wiping out trying to follow the ball and being rocketed off to the edges of the park
Saké Park <-> The first place I'm visiting after getting vaccinated
Divergence <-> Wiping out trying to follow the ball and being rocketed off to the edges of the park
Saké Park <-> The first place I'm visiting after getting vaccinated