Specifically, we’d like to be able to fit any kind of model you could write in BUGS (i.e., any directed acyclic graphical model expressible with side arithmetic and a suite of known distributions).

]]>I’m also checking out Theano, which has C code generators and even a CUDA (Nvidia GPU) linker. I can’t see that it has a full LAPACK-like matrix lib, though, and we’re going to need to do some hairy matrix ops for hyperprior covariance matrices.

We can’t really do what we want easily directly in numpy because the functions aren’t easily vectorizable. Matt Hoffman’s been using Cython to speed up the basic Python implementation of HMC; I don’t know if he’s tried the type defs, because they appear to be critical for Cython speed if you want to achieve “almost the speed of C” (and by “almost”, it looks like a factor of 2 or so slower, which we could probably live with).

]]>Let me try to sell you on NumPy, and PyMC: it’s actually not very difficult to write arbitrary C extensions for numpy. Cython is easy to use and compiles directly into C while interfacing nicely with NumPy, giving you the speed where it counts (I do this at work sometimes). This allows you to have all the benefits of numpy and the ability use the existing PyMC framework for doing MCMC. The downside is that derivatives would have to be custom written for the C extensions you create.

]]>For the kinds of models we have with lots of arbitrary designs in the data matrix, it’s hard to vectorize in something like numpy or R. If we have to evaluate our functions in Python, we’re sunk speed-wise.

]]>Unfortunately, I don’t have much constructive advice for you. My C kung-foo is weak, but I have looked around at the AD packages for both C and Python, and haven’t found anything that really seems like easy to learn.

Our aims diverge a bit because I am interested in making tools for general use, while you sound like you want to make something for specific use. Because I want something for general use, I have even more requirements, like I want the AD package to work well with Numpy arrays, supporting efficient broadcasting, the numpy lin-alg functions etc. all for at least the 2nd derivative. AlgoPy (http://pypi.python.org/pypi/algopy/0.3.0) may come close to this, but I haven’t had a chance to look into it.

]]>Our high level goal is widely shared: sampling from the posterior of a large-ish multilevel generalized linear model. For example, in Gelman et al.’s voting models, we have a few thousand predictors (mostly interaction terms) arranged into a few dozen levels, with parameters at each level getting multivariate normal (or t) priors which are themselves given scaled inverse Wishart hyperpriors (or maybe some other things we’ve been playing around with).

We’re looking at Hamiltonian Monte Carlo and would like to use AD to compute the gradients if it’s feasible. For AD to be less trouble than its worth, I need AD to work with matrix libs (inverses, eigenvalues, etc.), stat libs (multivariate normals, inverse Wisharts, etc.) and math libs (log gamma functions, etc.)

I’ve spent a few days poring over autodiff.org and the refs in the Wikipedia, but can’t seem to find anything that exactly fits what we need. I’ve installed packages and gotten the basics to run, but I can’t find anything that does reverse mode and has extensive math libs linkable.

Oh, we’d like to do this in C, so it’s efficient at the scale we’re looking at, and we’d also like to use a package with a license that lets us redistribute.

]]>One point I don’t agree with is that the problem is the curse of dimensionality. Currently I think the _curse of reversibility_ is the bigger issue. Statisticians seem enamored by detailed balance because it makes life easy, but the flip side is that it is equivalent to reversibility and that is a killer. Applying a fixed reversible transition kernel gives geometric convergence. We’ve got to do better than that.

I hope you are heartened to hear that there are indeed active research groups working hard on getting gradient based methods into MCMC — including the ones you listed work from. A few others you might be interested in are the delayed acceptance method that can use Jacobians (http://ba.stat.cmu.edu/journal/2010/vol05/issue02/christen.pdf), and a conjugate direction sampler that uses gradients, big-time (http://www.physics.otago.ac.nz/reports/electronics/ETR2008-1.pdf).

My guess is that the real gains are yet to be made, and will come when provably convergent non-reversible MCMC samplers make effective use of gradient infromation. We have interesting times ahead.

]]>