Silas Barta and I have a long ongoing debate (part 1, 2,3 part 2 actually comes before part 1) about monetary economics, 2008 recession policy and the views of mainstream macro-economists. This post summarizes the progress of the debate:

Theoretical issues

I think I’ve convinced Silas of a couple of things: I’ve clarified the mechanism by which monetary disequilibrium works. I’ve convinced Silas that the non-monetary impacts of conducting monetary policy, meaning buying and selling financial assets with newly created money, are not large. I have convinced Silas that having the Fed try to adjust the quantity of money to accommodate changes in the demand for money is not a terrible policy, though I don’t think I’ve convinced him it’s a good policy.

Silas has convinced me that the possibility of a decrease in the demand for money due to a decrease in market activity (for example, a shift towards consuming leisure instead of consumer goods) should be taken seriously. I think the evidence strongly indicates that’s not what’s going on right now, but a good monetary system should be able to handle such a change. I am not sure what kind of rule would deal well with this case as well as more conventional cases.

2008 recession policy

Silas and I still disagree about whether the evidence suggests that a high demand for money relative to the quantity of money has been a major problem over the last ~2 years. I haven’t convinced Silas that TARP and similar policies are basically independent of monetary policy, meaning not recommended (or disrecommended) by standard macro as well as implementable independently of monetary policy. Silas and I also disagree about how bad TARP and similar policies were. I claim that they were not terrible but not great. Silas seems to think they were terrible, but I am not clear on why.

Mainstream macro-economist’s view of the world

Silas and I still disagree about whether mainstream macro-economists see surface level economic statistics (inflation, GDP, spending, loans, interest rates, unemployment etc.) as ends in themselves, rather than being indicative of the state of the economy. I say it is obvious that mainstream macro-economists understand this distinction, while Silas maintains he doesn’t see any evidence they do. Silas and I do agree that many mainstream macro-economists have a poor understanding of monetary economics, so that even if they do understand the surface level statistics/ actual welfare distinction much of their advice will be bad.

Previously, I discussed the features which distinguish money from other goods (Money as a good), why you should view most money as a branded product, and how that affects the perspective you should take on central bank actions (Money as a product). I showed that it makes sense to talk about the best quantity of a particular money in the economy. Now I want to discuss one important process that affects what the best quantity of money is.

This process is called “monetary disequilibrium”, “excess cash balances mechanism” and probably some other things as well. The Keynesian concept of the “Paradox Of Thrift” is related, though less well developed. I will first describe the process informally. In later posts I will describe it more formally. My intent here is to give an intuitive explanation of the basics of monetary disequilibrium.

The real quantity of money that people would like to hold in equilibrium can change over time. Because prices are sticky this can have real effects in the economy. To see how, consider an economy initially at equilibrium with a fixed quantity of money and prices that adjust to changes only after some time (sticky prices). Some people in the economy decide they want to hold higher money balances than they had in the past:

When people hold less money than they would like, they try to increase their holdings of money in two ways: 1) try to reduce their spending 2) try to increase their income. The quantity of money is fixed, so if one person holds a higher nominal quantity of money than before, all others must hold a lower quantity of money than before in aggregate. Prices are fixed, so this is also true for the real quantity of money. When one person reduces their spending, they reduce the income of all others in aggregate. Unless those others desire to hold less money than before, they now hold less money than they would like. Now those others also try to increase their money holdings by the same means. This is a vicious circle and aggregate spending and incomes decline. The circle ends when people no longer want to cut their their spending to achieve higher money balances.

There are two effects which determine how far this process proceeds. 1) The quantity that people want to hold is positively related to the quantity people expect to spend, so as people expect to spend less they will need to hold somewhat less money. 2) As people reduce their spending, those reductions become more painful, so will be more reluctant to trade off consumption for increased money balances.

This process reduces the real quantity of market transactions below it’s equilibrium level. The real quantity of market transactions can only return to normal when prices have adjusted to the new equilibrium, so that people can hold higher real money balances given the fixed nominal quantity of money.

This is the foundational insight of money-based macroeconomics. For some reason this process is not explained in introductory macroeconomics classes, nor commonly discussed by mainstream macro-economists. I believe understanding this logic is critical for understanding the effect of money in the economy and for understanding macroeconomic fluctuations.

Arnold Kling constantly says things that give me the impression that he does not really grok the money-based macro theories he criticizes. For example, he once stated

Pretty much everything in AS/AD is riding on the hypothesis that labor supply is highly elastic at the nominal wage and labor demand is reasonably elastic at the real wage.

Depending on what exactly he meant, this is either false or very misleading.  There are certainly people who think it works this way, macro-economists even, but as Nick Rowe as explained, explanations that rely on the first order effects of real prices do not make sense. The only foundations for AS/AD-like models that make any sense is some kind of monetary-disequilibrium theory. In a monetary disequilibrium theory (Sumner calls it excess cash balances mechanism), if people hold lower real money balances than they would like, they try to accumulate higher money balances by reducing their spending or trying to increase their sales. Since one person’s spending is another’s income, an overall increase in the demand for money without an increase in the supply of money will lead to a decrease in overall spending (you can also call this a decrease in AD, though I don’t see the use).

The latest example is here (#2) (I was a tad too rude in the comments, and I apologize for that)

Yesterday in my high school econ class, I found myself trying to explain why having a separate currency that could depreciate would enable the PIIGS to live happily ever after. I made the textbook argument, but I found myself not so convinced. OK, so maybe you can tell a story where one country that has a recession and a large fiscal deficit would be better off with devaluation. But there are so many countries in that position right now, and they cannot all devalue.

Speaking of “cannot all devalue,” doesn’t the impact of the PIIGS crisis completely nullify QE2? If the dollar appreciates 10 percent and the foreign sector is 10 percent of the economy, then that represents 1 percent disinflation, which probably more than wipes out any inflationary impact of the Fed’s new bond buying program.

To me this just screams “missing the point”. Exchange rate effects are not how coherent money-based macro. Neither are the traditional income/substitution effects (unless you mean substitution towards holding money). It’s monetary disequilibrium.

In my last post, Cyan brought up the issue that many practitioners of statistics might object to using prior information in Bayesian statistics. The philosophical case for using prior information is very strong, and I think most people intuitively agree that using prior information is legitimate, at the very least in selecting what kinds of models to consider. I think most statistics users would be OK with using prior information when there is some kind of objective prior distribution. However, people justifiably worry about bias or overconfidence on the part of the statistician; people don’t want the results of statistics to depend much on the identity of the statistician.

In practice, this problem is not too hard to sidestep. There are at least two approaches:

The first is to include significantly less prior information than is available, to make make statistical inference robust to bias and overconfidence. The two common approaches to this are to use weakly informative priors or non-informative/maximum entropy priors. Weakly informative priors are very broad distributions that still include some prior information that almost no one would object to. For example, if you’re estimating the strength of a metal alloy, you might choose a prior distribution that expresses your belief that the strength will probably be stronger than that of tissue paper but weaker than a hundred times as strong as the strongest known material. Maximum entropy priors represent the minimum physically possible to know about the parameters of interest.

The second is to do the calculations using several different prior distributions that different consumers of the statistics might think are relevant. This accomplishes something like a sensitivity analysis for the prior distribution. For example, you might include a non-informative distribution, a weakly informative distribution and a very concentrated prior distribution. This allows people with different prior opinions to choose the result that makes the most sense to them.

This post will be a more technical than my previous post; I will assume familiarity with how MCMC sampling techniques for sampling from arbitrary distributions work (an overview starts on page 24, this introduction is more detailed). This post is about a specific class of MCMC algorithms: derivative based MCMC algorithms. I have two goals here: 1) to convince people that derivative based MCMC algorithms will have a profound effect on statistics and 2) to convince MCMC researchers that they should work on such algorithms. The goal of my previous post was to provide motivation for why good MCMC algorithms are so exciting.

A friend of mine suggested that this post would make the basis of a good grant application for statistics or applied math research. I can only hope that he is correct and someone picks up that idea. I’d do anything I could to help someone doing so.

Some background

In my last post, I mentioned that one of the things holding Bayesian statistics back is the curse of dimensionality

Although Bayesian statistics is conceptually simple, solving for the posterior distribution is often computationally difficult. The reason for this is simple. If P is the number of parameters in the model, the posterior is a distribution in P dimensional space. In many models the number of parameters is quite large so computing summary statistics for the posterior distribution (mean, variance etc.) suffers from the curse of dimensionality. Naive methods are O(N^P).

The posterior distribution is a probability distribution function over the whole space of possible parameter values. If you want to integrate numerically over a distribution with 10 parameters to calculate some statistic, say the mean, and you split up the space into 20 bins along each parameter-dimension, you will need a 10 trillion element array. Working with a 10 trillion element array is very expensive in terms of both memory and computer time. Since many models involve many more than 10 parameters and we’d like to have higher resolution than 1 in 20, this is a serious problem.

Instead of integrating directly over the space, we can use Monte Carlo integration: sample from this probability distribution and use the samples calculate our statistic (for example, averaging the points to calculate the mean). Markov Chain Monte Carlo (MCMC) can be used to sample from any probability distribution. MCMC works by starting from an arbitrary point in the space and then picking a random point that’s near by, if that point is more likely than the current point, then that point is adopted as the current point. If it’s less likely than the current point, then it may still be adopted with a probability depending on the ratio of the likelihoods. If certain criteria are met (the detailed balance), this process will eventually randomly move around the whole distribution in a way that is proportional to the likelihood; the process will sample from the distribution (though each successive point is not statistically independent from the previous one).

Sounds great, but unfortunately naive MCMC does not solve our problem completely; in a high dimensional space, many more directions have decreasing probability than higher probability than have increasing probability. If we pick a direction at random, we have to move slowly or wait a long time for a good direction. Assuming an n-dimensional, approximately normal distribution, naive MCMC algorithms are O(n) in the number of steps it takes to get an independent sample. Now O(n) doesn’t sound that bad, but if you take into account the fact that calculating the likelihood is often already O(n), it means that fitting many models takes O(n**2) time. This drastically limits the models which can be fit without having a lot of MCMC expertise, or integrating over the distribution analytically.

Derivative based MCMC algorithms

MCMC sampling has many similarities with optimization. In both applications, we have an often multi-dimensional function and we are most interested in the maxima. In optimization, we want to find the highest point on the function; in MCMC sampling we want to find regions of high probability and sample in those regions. In optimization, many functions are approximately quadratic near the optima; in MCMC sampling, many distributions are near normally distributed and the log of a normal distribution is quadratic (taking the log of the distribution is something you have to do anyway). In optimization, if you have a quadratic function, many algorithms will find a maxima in 1 step or very few steps regardless of the dimensionality of the function. They do this by using the first and second derivatives of the function to find a productive direction and magnitude to move in.

There is a class of MCMC algorithms which solve the curse of dimensionality by taking a lesson from optimization and use the derivatives of the posterior distribution to inform the step direction and size. This lets them preferentially consider the directions where probability is increasing using 1st derivative information and get a measure of the shape of the distribution using 2nd derivative information. Such algorithms perform much better than naive algorithms. They take larger step sizes, mix  and converge faster. With respect to the number of parameters, Langevin MCMC algorithms (which use 1st derivative information) are O(n**1/3) (link), and Stochastic-Newton algorithms (which use 1st and 2nd derivative information and are analogous to Newtons Method) are O(1) (link). A Stochastic-Newton method will independently sample an approximately normal distribution in approximately one step, regardless of the number of parameters. This opens up a huge swath of the space of possible models for fitting without needing to do much math or needing much MCMC knowledge.

Derivative based MCMC algorithms have other advantages as well.

First, both 1st and 2nd derivative methods take much larger steps than naive methods. This means it is much easier to tell whether the distribution is converging or not in normal ways. The downside of this is that such algorithms probably have different failure modes than naive algorithms and might need different kinds of convergence diagnostics.

Second, 2nd derivative algorithms are self tuning to a large extent. Because the inverse hessian of the posterior distribution represents the variance of the normal distribution which locally approximates the function, such algorithms do not need a covariance tuning parameter in order to work well.

The future of MCMC

The obvious problem with these methods is that they require derivatives which can be time consuming to calculate analytically and expensive to calculate numerically (at least O(n)). However there is an elegant solution: automatic differentiation. If you have analytic derivatives for the different component parts of a function and the analytic derivatives of the operations used to put them together, you can calculate the derivatives for  the whole function using the chain rule. The components of the posterior distribution are usually well known distributions and algebraic transformations, so automatic differentiation is well suited to the task.

This approach fits in remarkably well with existing MCMC software, such as PyMC, which allow users to build complex models by combining common distributions and algebraic transformations and then allow users to select an MCMC algorithm to sample from the posterior distribution. Derivative information can be added to existing distributions so that derivative based MCMC algorithms can function.

I have taken exactly this approach for first derivative information in a PyMC branch used by my package multichain_mcmc which contains an Adaptive Langevine MCMC algorithm. I graduated a year ago with an engineering degree, and I have never formally studied MCMC or even taken a stochastic processes class; I am an amateur, and yet, I was able to put together such an algorithm for very general use; creating truly powerful algorithms for general use should pose little problem for professionals who put their mind to it.

There is a lot of low hanging fruit research fruit in this area. For example, the most popular optimization algorithms are not pure newton’s method because it is a bit fragile; the the same is likely true in MCMC, for the same reasons. Thus it is very attractive to look at popular optimization algorithms for ideas on how to create robust MCMC algorithms. There’s also the issue of combining derivative based MCMC algorithms with other algorithms with desirable properties. For example, DREAM (also available in multichain_mcmc) has excellent mode jumping characteristics; figuring out when to take DREAM-like steps for best performance is an important question.

Given its potential to make statistics dramatically more productive, I’ve seen surprisingly little research in this area. There is a huge volume of MCMC research, and as far as I can tell, not very much of it is focused on derivative based algorithms. There is some interesting work on Langevin MCMC; for example an adaptive Langevin algorithm, some convergence results, and an implicit Langevin scheme, and also some good work on 2nd derivative based methods; for example, optimization based work, some numerical work, and some recent work. But considering that Langevin MCMC came out 10 years ago much more focus is needed.

I’m not sure why this approach seems neglected. It might be that research incentives don’t reward such generally applicable research, or that MCMC researchers do not see how simplified MCMC could dramatically improve the productivity of statistics, or perhaps researchers haven’t realized how automatic differentiation can democratize these algorithms.

Whatever the issue is, I hope that it can be overcome and MCMC researchers focus more on derivative based MCMC methods in the near future. MCMC sampling will become more reliable, and troubleshooting chains when they do have problems will become easier. This means that even people who are only vaguely aware of how MCMC works can use these algorithms, bringing us closer to the promise of Bayesian statistics.

Cox showed that if you want to represent ‘degree’s of belief’ both using real numbers and consistent with classical logic, you must use probability theory (link). Bayes theorem is the theoretically correct way to update probabilities based on evidence. Bayesian statistics is the natural combination of these two facts.

Bayesian statistics two chief advantages over other kinds of statistics:

  1. Bayesian statistics is conceptually simple.
    • This excellent book introduces statistics, some history and the whole of the theoretical foundations of Bayesian statistics in a mere 12 pages; the rest of the book is examples and methods.
    • Users of classical statistics very frequently misunderstand what p-values and confidence intervals are. In contrast, posterior distributions are exactly what you’d expect them to be.
    • After learning the basics, students can easily derive their own methods and set up their own problems. This is not at all true in classical statistics.
  2. Bayesian statistics is almost always explicitly model centric. It requires people to come up with a model which describes their problem. This has several advantages:
    • It’s often very easy to build a model that’s very closely tailored to your problem and know immediately how to solve it conceptually if not practically.
    • It makes it harder to be confused about what you’re doing.
    • It’s easier to recognize when your assumptions are bad.

Here, I expand a little on the advantages of Bayesian statistics.

The promise of Bayesian statistics is that with Bayesian statistics, actually conducting statistical inference will only be as difficult as coming up with a good model. Statistics education will focus on modeling techniques, graphical display of data and results and checking your assumptions, rather than on tests and calculations. People with only a college class or two worth of statistics will be able to fit any kind of model they can write down. Fitting a model will be a non-event.

If Bayesian statistics is so great, why isn’t it more widely used? Why isn’t the promise of Bayesian statistics the reality of statistics? Two reasons:

  1. Bayesian statistics has historically not been very widely used. Scientists and engineers have grown up using classical statistics, so in order for new scientists and engineers to communicate with their peers and elders, they must know classical statistics.
  2. Although Bayesian statistics is conceptually simple, solving for the posterior distribution is often computationally difficult. The reason for this is simple. If P is the number of parameters in the model, the posterior is a distribution in P dimensional space. In many models the number of parameters is quite large so computing summary statistics for the posterior distribution (mean, variance etc.) suffers from the curse of dimensionality. Naive methods are O(N^P).

In a future post I will explain why I think a new kind of MCMC algorithm is largely going to resolve problem #2 in the near future.

If you’d like to learn Bayesian statistics and you remember your basic calculus and basic probability reasonably well, I recommend Data Analysis by Sivia. Bayesian Statistics by Bolstad has an intro to probability theory and the useful calculus, but isn’t as useful.

I want to make the case that thinking of money as a product specifically designed to be used as money and produced by a producer is often a useful perspective, and that this perspective remains useful for government created money. From this perspective, the Federal Reserve (a branch of the Federal government) is the producer of US dollars and the Chinese government is the producer of the yuán. This perspective grew out of my ongoing debate with Silas.

It is not difficult to imagine money which is not produced by anyone. An economy that uses pure gold in no particular shape uses money which is not anyone’s product. There might be gold miners, but they do not produce gold for use as money necessarily, and it could be the case that gold is simply found on the ground occasionally. It is also easy to see the drawbacks of such a money. If gold is in nonstandard lumps it must be weighed and purity tested for each transaction. It also means that people must keep a real resource that might otherwise be used for some productive purpose, so it may mean gold is not used in the optimal manner.

One can ameliorate some of these problems by using a similar but separate product specifically designed to be used as money. A sedan is serviceable for transporting lumber, but a product specifically designed for the task, such as a truck, is much better. Perhaps some bank or government will start minting standard weight and purity gold coins specifically to be used as money for a fee, and people will come to prefer using these coins to gold lumps, perhaps trading at a premium to gold lumps. Now these coins have become different product from gold lumps. The bank takes gold lumps and produces gold coins that have extra properties. These coins can meaningfully said to be the product of that bank or government and not the product of any other bank or government even if competitors produce very similar coins.

If such coins are to succeed it is important for it to bear the bank or government’s name or be otherwise branded. If the coins are not branded or brands are not respected, then it is easy for counterfeiters to ruin the product by producing similar but lower purity coins. However, it is important to distinguish between counterfeiting and merely competing products. A rival bank or government who mints their own coins with a different brand has produced a separate product, much as Gucci bags are not counterfeits of Chanel bags even if they look similar. If a competitor with a different brand produces lower quality coins, they will just ruin their product no one else’s.

This perspective applies very well to government monies. All government monies that I have seen bear a brand of that government (US dollars say “Federal reserve note”), and other people are not allowed to produce money with that brand. Some countries, including the US allow competitors, and some countries do not (link). Most central banks, including the Federal Reserve, turn a profit from their activities.

Most methods of improving a non-product money will involve making money into a product, for the same reason that most methods of improving wild tomatoes as a food source involve making tomatoes into a product. It is almost certainly optimal for most money to be product money.

One implication of this view is that it is meaningful to talk about the optimal level of money production by a bank or government, for the same reasons it is meaningful to talk about the optimal level of bread production by bread producers, there will be some level of production that maximizes welfare (and/or producer profits depending on your optimization criteria).

Another implication of this view is that government produced money is not necessarily special. It is possible that it is special because private producers can’t commit to appropriate production or because money production has some externality that private producers do not take into account, but this is something to be demonstrated rather than true a priori.

This is the first part of a planned introduction to monetary economics. I imagine it will develop slowly, but hopefully I will stick with it. I plan to first post sections here and then revise them and place them into a single document. Please leave a comment if you have a comment or criticism.

Monies have two major uses which distinguish them from all other goods:

  • Unit of Account – Prices are quoted in terms of money rather than other goods. For example, the price of a gallon of milk will be quoted as $1.59/gallon rather than .1 music lessons/gallon.
  • Medium of exchange – When people trade, they trade goods for money and then trade money for other goods. I usually cannot trade music lessons for groceries at the grocery store, but I can trade money for groceries at the grocery store.

These two uses are distinct and separable, but come together so often that we have a name for goods that have both uses. A good that is both a Unit of Account and a Medium of Exchange is called a Money.

A good can be a Medium of Exchange but not a Unit of Account. Postage stamps (not forever stamps) are a good example; you need stamps to give stamps to the post office to mail a letter, but the price is given in terms of money (you need 43 cents worth of stamps).

For another example, consider an economy where wool is very common and used as the Medium of Exchange. However wool is difficult to quantify, it has a mass which is nontrivial to weigh in large quantities but can be eyeballed effectively by experienced wool traders and a quality which is difficult to quantity but can also be discerned by wool traders. Since wool is difficult to quantify prices are not generally quoted but negotiated on the spot, so wool does not serve as a Unit of Account (there is none).

A good can also be a Unit of Account but not a Medium of Exchange. If prices in some market are quoted in terms of a good M (for example .1 music lessons/gallon) but with the understanding that the exchange will be conducted with gold (you will exchange .1 music lessons worth of gold (looking at other prices) for a gallon of milk) then M is a Unit of Account in that market but not a Medium of Exchange.

Properties of Units of Account

Units of Account require some properties to be workable as Units of Account. These properties can play an important role in the economics of money, but not necessarily unique to Units of Account. These are some but there are probably others as well:

  • Quantifiable – Units of Account must be quantifiable in some way in order to communicate prices. Many goods besides goods used as Units of Account are quantifiable.
  • Translatable – Units of Account must be able to be translated into meaningful terms of trade for an actual transaction. The preserved body of chairman Mao does not work well as a Unit of Account because it is difficult to translate “.1 bodies of Mao” into a meaningful quantity of any other goods.

Properties of Mediums of Exchange

Mediums of Exchange require some properties to be workable as Mediums of Exchange. These properties can play an important role in the economics of money, but not necessarily unique to Mediums of Exchange. These are some but there are probably others as well:

  • Store of Value – Since people hold a Medium of Exchange in order use them for future purchases, they must be worth something in the future so they must be able to effectively move resources through time. If you make $20 babysitting today, you can either spend it and consume  today, or you can spend it next week and consume then. No one will use a good as a medium of exchange if it does not store value to some degree. Mediums of Exchange are not special as a store of value; many other goods are also stores of value over time. Anything you would call an ‘asset’ is a store of value. All financial assets are stores of value, stocks, bonds options etc.. Assets vary in how they store value, some assets rise in value, some assets decline in value. Mediums of Exchange are also not necessarily special in how well it stores value over time; it can rise in value (deflation) or drop in value (inflation), it may even pay interest, like financial assets.
  • Transferable – If a good is not transferable to other agents, it cannot be used in exchange, so it cannot be a Medium of Exchange. Education is a Store of Value, but not transferable, so it can’t be used as a Medium of Exchange. Since there is a lower limit on transfer costs (zero) and many goods are near this limit, differences along this dimension are not usually important.
  • Measurable – The important qualities of a good (including quantity) must be measurable (not necessarily quantifiable) to be used as a Medium of Exchange. Since there is a lower limit on measurement errors and costs (zero) and many goods are near this limit, differences along this dimension are not usually important. Lots of other goods are measurable; water is measurable (gallons); cupcakes are measurable (mass, deliciousness (which may not be quantifiable, but is measurable)).

In the sections above, the only highlighted property is Store of Value because this is the one that can be significantly different across different monies and across time. It plays an important role in practical monetary economics, but the other properties do not.

Different monies

In monetary economics we frequently talk about “money” as if  there were only one kind of money because we are usually focusing on one particular money. In reality there are many kinds of money; there are different currencies, and goods like bus tokens. Bus tokens are goods that are used in the same way as money is: bus ride prices are often quoted in terms of bus tokens (though not exclusively) and the bus will trade you bus tokens for a bus ride. Monetary economists would regard bus tokens as money. The difference between these different kinds of monies is the set of markets where where they are used as a unit of account and a medium of exchange. The set of markets that accept US dollars is much larger than the set of markets that use the bus tokens of a given bus system. Most US stores do not accept bus tokens, but they do accept US dollars. Likewise, most US stores do not quote prices in terms of bus tokens, but they do quote them in terms of US dollars. One can think of bus tokens as “bus money” and US dollars as “US money” and Euros as “Europe money”. The economics of money still applies to goods like bus tokens in the set of markets where they are used as money.

A while back, Nick Rowe had a post about the option value of cash. Rowe asserts that cash gives a real option to do anything. At first, I didn’t see a problem with this, but as part of my ongoing debate with Silas Barta, I’ve thought a bit more about this. I think Rowe’s point is technically correct, but misleading because he neglects to mention that all reasonably liquid stores of value give this option value. Reasonably liquid stores of value, primarily financial assets, can give you the option to purchase whatever you like later on. Now cash is more liquid than any other asset, so on the time scale of hours or days only cash gives you the option value you want, but on reasonable time scales (enough time for your brokerage firm to wire you the money) many financial assets will do.

It can also be misleading for another reason; it can be tempting to think that it’s cash itself that creates that option value, that this option free in the sense that it is available regardless of the state of the world. However, money is just a kind of asset. If the option of saving in terms of money sends false signals about the profitability of actual short run saving (in the form of trading promises or of doing actual real investment), this can cause problems. In the usual case, where cash pays 0% interest and market rates for short run saving is positive, say 2%, this leads people to inefficiently avoid holding money. In the case where cash pays 0% interest and market rates for short run savings are negative, say -2%, this causes the opposite problem people try to hold too much cash.

If the option of waiting a short period of time is very valuable, so everyone wants to do it, this may cause short run rates to be very negative and the 0% cash option to be very distortionary.

Google has apparently been testing unmanned cars with some success (link).

I suspect biggest impact of unmanned cars would be taxi services replacing private ownership of  cars. I would guess that the biggest differential cost between private car ownership and taxi services is the need for a non-passenger driver; human labor is expensive. If you eliminate this cost for taxis, it would become much more practical to call up a car and have it come pick you up when you need to go somewhere instead of keeping your own car around. Unmanned taxis would also eliminate a big fraction of the need for street parking, since cars could be kept in central parking garages. This would increase space efficiency in medium sized cities quite a lot.