Neal Radford and others had some  interesting responses to my question about why Hamiltonian MCMC (HMC) might be better than Langevin MCMC (MALA). The gist of it seems to be that HMC is less random-walk like and thus mixes faster and has better scaling with number of dimensions.

Radford points to a survey paper of his (link) which discusses how the momentum distribution should be adjusted for changes in the scaling of the probability distribution (p. 22). This is something which I didn’t see last time I looked at HMC, and it’s necessary for an adaptive HMC algorithm. General use sampling algorithms can benefit a lot from being adaptive.

It also discusses tuning the step-count and step-size. This sounds rather difficult and non-linear.

I am going to try to implement an adaptive HMC algorithm in my multichain_mcmc package. I’d like to make this algorithm adaptive as I’ve done for my MALA implementation, though in general, this needs to be done carefully (see Atchade and Rosenthall 2005).

I’m interested in RM-HMC as it promises automatic scale tuning and better efficiency scaling with high dimensions, but it looks like understanding it requires differential geometry, which I haven’t yet worked through. I believe it also requires 2nd derivatives (which provide scale information), which I haven’t yet figured out how to implement in an efficient and generic manner for PyMC. I suspect that would require a fork and redesign of PyMC.