Cox showed that if you want to represent ‘degree’s of belief’ both using real numbers and consistent with classical logic, you must use probability theory (link). Bayes theorem is the theoretically correct way to update probabilities based on evidence. Bayesian statistics is the natural combination of these two facts.

Bayesian statistics two chief advantages over other kinds of statistics:

1. Bayesian statistics is conceptually simple.
• This excellent book introduces statistics, some history and the whole of the theoretical foundations of Bayesian statistics in a mere 12 pages; the rest of the book is examples and methods.
• Users of classical statistics very frequently misunderstand what p-values and confidence intervals are. In contrast, posterior distributions are exactly what you’d expect them to be.
• After learning the basics, students can easily derive their own methods and set up their own problems. This is not at all true in classical statistics.
2. Bayesian statistics is almost always explicitly model centric. It requires people to come up with a model which describes their problem. This has several advantages:
• It’s often very easy to build a model that’s very closely tailored to your problem and know immediately how to solve it conceptually if not practically.
• It makes it harder to be confused about what you’re doing.
• It’s easier to recognize when your assumptions are bad.

Here, I expand a little on the advantages of Bayesian statistics.

The promise of Bayesian statistics is that with Bayesian statistics, actually conducting statistical inference will only be as difficult as coming up with a good model. Statistics education will focus on modeling techniques, graphical display of data and results and checking your assumptions, rather than on tests and calculations. People with only a college class or two worth of statistics will be able to fit any kind of model they can write down. Fitting a model will be a non-event.

If Bayesian statistics is so great, why isn’t it more widely used? Why isn’t the promise of Bayesian statistics the reality of statistics? Two reasons:

1. Bayesian statistics has historically not been very widely used. Scientists and engineers have grown up using classical statistics, so in order for new scientists and engineers to communicate with their peers and elders, they must know classical statistics.
2. Although Bayesian statistics is conceptually simple, solving for the posterior distribution is often computationally difficult. The reason for this is simple. If P is the number of parameters in the model, the posterior is a distribution in P dimensional space. In many models the number of parameters is quite large so computing summary statistics for the posterior distribution (mean, variance etc.) suffers from the curse of dimensionality. Naive methods are O(N^P).

In a future post I will explain why I think a new kind of MCMC algorithm is largely going to resolve problem #2 in the near future.

If you’d like to learn Bayesian statistics and you remember your basic calculus and basic probability reasonably well, I recommend Data Analysis by Sivia. Bayesian Statistics by Bolstad has an intro to probability theory and the useful calculus, but isn’t as useful.