You are currently browsing jsalvati’s articles.

I wrote up some guided examples for the development version of PyMC 3 in IPython notebook, and they came out beautifully. A simple tutorial model, and a more impressive stochastic volatility example (see graphs towards the bottom). Comments or suggestions welcome.

STAN is looking for someone to write a Python wrapper for STAN. In the comments, Bob Carpenter has some suggestions for PyMC3:

They should include Matt’s no-U-turn sampler. And they might be interested in how we handled constrained variables and vectorizations and cumulative distributions for truncation.

Bob’s comment made me realize we haven’t talked nearly enough, so I really appreciate Bob’s comment. I have some questions and comments for Bob and the rest of the STAN team:

- Handling constrained variables by transforming them (STAN manual, p.6): PyMC3 does not currently implement sampling on a transformed space, but that feature was actually one of the motivations for the design of PyMC3. It would be quite simple to add it, given that you have the log Jacobian determinant for the transformation. Of course, the Jacobian is non-trivial part. Any practical advice on when this has been most beneficial and by how much?
- Vectorization: Theano does pretty sophisticated vectorization which works the same way NumPy does.
- Cumulative distributions for truncation (STAN manual, p.145): this is a very cool idea, and should fit quite naturally in the PyMC3 framework.
- Neal Radford had some critcism for NUTS, saying it was didn’t provide strong benefits. His criticisms seem good to me, but I haven’t thought very deeply about it. Are there good responses to his criticism? Perhaps good examples of where NUTS works significantly better than a well tuned HMC sampler? I do not usually tune HMC by changing the mass matrix rather than the step size or trajectory length.
- Basing the scaling matrix on hessian matrix at a representative point (for example, the MAP) has often worked quite well for me. PyMC3 currently finds the hessian by differentiating the gradient numerically, but it’s also possible to calculate it analytically. Often, just the diagonal of the hessian is good enough. As I understand it, STAN is or perhaps was simply using the identity matrix as the mass matrix. Perhaps this accounts for why NUTS is so useful for STAN? This approach also allows you to supply either a covariance or hessian matrix; full or diagonal or sparse for either.

It’s probably hard moving his family to England from Canada to go work at the Bank of England, but the world is better for it.

There are a couple of cool papers on Hamiltonian Monte Carlo (HMC) that I’ve found recently.

- Quasi-Newton Methods for Markov Chain Monte Carlo – The author, Y. Zhang, adapts the classic optimization algorithm BFGS to yield an algorithm for automatically building a local mass matrix for the HMC sampling from the history of posterior gradients (which are already required for HMC). There is also a limited memory (and computation cost) version of the algorithm analogous to L-BFGS which only stores an approximation of a full rank hessian matrix. This has the potential to make HMC require significantly less tuning. Currently, I usually use the hessian at the Maximum A Posteriori point has the mass matrix for HMC, but this will fail for distributions that are much more flat or peaked at the MAP than they are elsewhere in the distribution and this may be more robust.
- Split Hamiltonian Monte Carlo – The authors, Shababa, Lan, Johnson and R. Neal, develop a technique for splitting the Hamiltonian into a fast part and a slow part. In Algorithm 1, they split the Hamiltonian into a normal approximation of the posterior (centered at the MAP and with covariance based on the inverse hessian at the MAP) and a residual component. The normal approximation can be simulated exactly and will contribute nothing to the error term, hopefully decreasing the overall error. When the distribution is close to a normal distribution, this technique should have improved efficiency (per sample) which should stay relatively constant as the number of dimensions increases. In the paper, the authors show pretty significant speed increases when doing logistic regression.
I wonder whether this type of split HMC will work very poorly on distributions that have pretty hard edges (like scale parameters), or more generally, thin tails. I think it would be fairly easy to get into a situation where the exact part of the simulation wants to move into a part of the space that is of really low probability.

I would love to see a synthesis between and Zhang’s Quasi-Newton technique. A synthesis should allow both the computation (hessian related operations normally scale O(n**2)) and the acceptance probability to scale well as the number of dimensions increases.

I have recently implemented the Split Hamiltonian algorithm in my experimental Bayesian inference package, and I intend to implement the Quasi-Newton algorithm in the near future.

Since I have been unable to find a simple mathematical model of monetary disequilibrium, I’ve been interested in putting on together. Mathematical models help people narrow down exactly where disagreements lie and make help make sure that their thinking isn’t confused. Since there’s a lot of disagreement and confusion in macroeconomics, I think simple mathematical models should be especially helpful here.

The model I’ve built consists of a large number of identical agents (I believe this is called a “representative agent” model) in an economy with two goods, backrubs and money. You can’t consume your own backrubs, so they have to be traded in a market, but it still makes sense to have a market for them. In this model, money provides utility by fascilitating trade, the more money agents have relative to the amount they’re buying the more utility they get. The utility of money can vary over time. I’ve described the model more in depth in my write up.

Most macro models I’ve seen start at too high a level, taking aggregate demand/supply, interest rates, etc. as basic concepts instead of utility functions and optimization. To be as clear as possible, I’ve tried to the model start from first principles as much as possible.

If you find an error or want to make a technical or non-technical suggestion, please let me know. Though the model is fairly simple (it uses only simple calculus), I haven’t done much economic modeling before, so I wouldn’t be very surprised to find errors.

I’ve written up the model here (LaTeX source). I also have an excel simulation of the model which I’m working on. I *think* it’s working right, but I haven’t checked it thoroughly yet (the draft is here). I’ll probably update these a bit in the future.

I would find it really useful to have a simple concrete mathematical model that demonstrates monetary disequilibrium. I could use it to troubleshoot my intuitions about monetary matters, develop new and better intuitions, and better explain the logic of monetary disequilibrium better. Unfortunately, I haven’t run across such a model and it looks like my current math and modeling skills are insufficient to produce one myself. Does anyone have a paper, book or post that presents mathematical model of monetary disequilibrium suited for at least one of these purposes?

Here’s an example of what I would expect such a model to look like:

An economy with a large number of two types of agents each producing a different good and an infinite number of periods. Both agents have the same type of utility function which has a term for how much of each good they consume in each period, how much of their production good they produce, and a term for the utility of money which is proportional to the amount they spend in each period. There should be some set of prices that characterize total equilibrium. We can investigate the effects of monetary disequilibrium by seeing how different price paths influence different agent’s utility, production and consumption over time.

Karl Smith claims

Money does not create anything. Value stored as money is value lost; lost because it represents resources not directed towards capital.

That said, it’s conceptually easy to make money a poor store of value: give it a large negative interest rate. This is necessary when the asset used to produce money (normally government bonds) have a low or negative interest rate in order to avoid having the central bank subsidize people’s holding of money.

Neal Radford and others had some interesting responses to my question about why Hamiltonian MCMC (HMC) might be better than Langevin MCMC (MALA). The gist of it seems to be that HMC is less random-walk like and thus mixes faster and has better scaling with number of dimensions.

Radford points to a survey paper of his (link) which discusses how the momentum distribution should be adjusted for changes in the scaling of the probability distribution (p. 22). This is something which I didn’t see last time I looked at HMC, and it’s necessary for an adaptive HMC algorithm. General use sampling algorithms can benefit a lot from being adaptive.

It also discusses tuning the step-count and step-size. This sounds rather difficult and non-linear.

I am going to try to implement an adaptive HMC algorithm in my multichain_mcmc package. I’d like to make this algorithm adaptive as I’ve done for my MALA implementation, though in general, this needs to be done carefully (see Atchade and Rosenthall 2005).

I’m interested in RM-HMC as it promises automatic scale tuning and better efficiency scaling with high dimensions, but it looks like understanding it requires differential geometry, which I haven’t yet worked through. I believe it also requires 2nd derivatives (which provide scale information), which I haven’t yet figured out how to implement in an efficient and generic manner for PyMC. I suspect that would require a fork and redesign of PyMC.

Economists frequently mention the idea of an Optimal Currency Area. Krugman does it. Barry Eichengreen does it. Even monetary equilibriumist Nick Rowe does it.

As I understand it, the idea is that monetary policy helps alleviate recessions. Because different one area can be in a boom and another in a bust at the same time, it is useful to have small currency areas because then you can have more finely tuned monetary policy. This pushes the currency area that maximizes benefits (the optimal currency area) smaller. The fact that arranging trade with different currencies can be more expensive and that areas can have correlated business cycles pushes the optimal currency area bigger.

If you understand monetary economics from a monetary-equilibrium perspective, this should strike you as exceedingly odd.

First, lets make some important distinctions. Lets say a “recession” is a temporary decline in the production of market goods, without specifying it’s cause. The monetary equilibrium theorists note that an a decrease in the quantity of money relative to the demand for money can cause such a temporary decline in production and has a negative effect on welfare (explanation). Any given recession might be due to monetary disequilibrium and/or other effects.

Monetary equilibrium theory implies that relieving monetary disequilibrium by adjusting the quantity of money to reflect changes in the demand for money is welfare enhancing because it avoids price adjustment costs as well as the costs of non-equilibrium production.

However, monetary equilibrium theory does *not* suggest that adjusting the quantity of money to respond to (temporary or non-temporary) changes in production for reasons other that monetary disequilibrium is welfare enhancing. If production of market goods falls because of a real productivity shock, increasing the quantity to compensate increases market good production but is welfare reducing because it adds adjustment costs and moves market good production away from it’s equilibrium level.

Thus, if Optimal Currency Areas are to make sense from a monetary disequilibrium perspective, it must be that different areas in the same currency zone can have monetary disequilibrium in the opposite directions.

The major purpose of the financial system is to move money (and other assets) from those who want them relatively less to those who want them relatively more. People who want to hold money relatively more than others borrow or sell assets and vice versa.

If the financial system is *not* doing this, then we *already* have two different currencies. Monetary policy conducted in the first area doesn’t have much of an effect on the second area and vice versa. The same bills in the first area may have a totally different price than in the second area. Making these two kinds currencies more readily distinguishable (by changing the “currency area”) would only make it harder for the whole economy to come to equilibrium.

I often see people express the idea that the production or destruction of money must necessarily cause problems for the economy because that money does not “represent new real wealth”. There are many variants of this notion, such as that “good money” must “represent” some real asset (like gold).

However, this notion is fundamentally confused.

First, notice that as method of economic reasoning “representation” not great; there is no deep economic notion of “representation”. At best it could be a heuristic, you notice that money is not connected to particular real projects and think “huh, that’s weird” and decide to investigate further.

Next, notice that financial assets in general do not derive their value from “representing” some project or another. A financial asset derives its value from another party’s credible promise that the holder of the financial asset may receive something of value at some point in the future. For example, a corporation may issue bonds to undertake a new project and these bonds will have value, but the value is not derived from the project, the value is derived from the promise the corporation gives that the bonds will be honored. Such corporate bonds would have the same value whether the corporation issued them for a new profitable project or an unprofitable project or because of a clerical error, and they would cease to have value if the corporation’s promise went away.

Financial assets are useful because they are useful to either the issuer or the holder. Bonds allow businesses to undertake projects or smooth out cash flows; stocks allow businesses to get initial capital and allow investors to store resources; money helps lower transactions costs for people.

Finally, note that a financial asset is an asset to one party (the holder) and a liability to another party (the issuer). The subjective value of the asset to the holder may be larger or smaller than the subjective value of the liability to the issuer.

The US dollar has value because there are implicit (but credible) promises that it can be exchanged for something of value. These promises come from two sources: 1) the general public because they currently accept money as payment for other things of value 2) the Federal Reserve because they implicitly promise that they will trade dollars for something else of value in order to make sure that dollars continue to be valuable. Like other financial assets, its value has nothing to do with whether it represents real assets or not, and whether the economy would be better off with more or less of it has nothing to do with whether it “represents” real projects.

This was an attempt to address a popular confusion. I’m not totally satisfied with it, so if you have suggestions on how to improve it or know an article that does it better, let me know.