comparative | Rob Weiss | Biostat Faculty

Why Be Bayesian? Let Me Count the Ways

In answer to an old friend's question.

Rather surprisingly, Bayesian software is a lot more general than frequentist software.

Small sample inference comes standard with most Bayesian model fitting these days.

But if you like your inference asymptotic, that's available, just not high on anyone's priority list.
We can handle the no-data problem, all the way up to very large problems.
Don't need a large enough sample to allow for a bootstrap.

Hierarchical random effects models are better fit with Bayesian models and software.

If a variance component is small, the natural Bayes model doesn't allow zero as an estimate, while the natural maximum likelihood algorithms do allow zero. If you get a zero estimate, then you're going to get poor estimates of standard errors of fixed effects. [More discussion omitted.]
Can handle problems where there are more parameters than data.

If there's perfect separation on a particular variable, the maximum likelihood estimate of the coefficient is plus or minus infinity which isn't a good estimate.
Bayesian modeling offers (doesn't guarantee it, there's no insurance against stupidity) the opportunity to do the estimation correctly.
Same thing if you're trying to estimate a very tiny (or very large) probability. Suppose you observe 20 out of 20 successes on something that you know doesn't have 100% successes.
To rephrase a bit: In small samples or with rare events, Bayesian estimates shrink towards sensible point estimates, (if your prior is sensible) thus avoiding the large variance of point estimates.

Shrinkage estimates
Empirical Bayes
Lasso
Penalized likelihood
Ridge regression
James-Stein estimators
Regularization
Pittman estimation
Integrated likelihood
In other words, it's just not possible to analyze complex data structures without Bayesian ideas.

Your answers are admissible if you're Bayesian but usually not if you're a frequentist.

Admissibility means never having to say you're sorry.
Alternatively, admissibility means that someone else can't prove that they can do a better job than you.
And if you're a frequentist, someone is clogging our journals with proofs that the latest idiocy is admissible or not.
Unless they are clogging it with yet more ways to estimate the smoothing parameter for a nonparametric estimator.

Bayesian models are generalizations of classical models. That's what the prior buys you: more models
Can handle discrete, categorical, ordered categorical, trees, densities, matrices, missing data and other odd parameter types.
Data and parameters are treated on an equal playing field.
I would argue that cross-validation works because it approximates Bayesian model selection tools.
Bayesian Hypothesis Testing

Provides a language for talking about modeling and uncertainty that is missing in classical statistics.

And thus provides a language for developing new models for new data sets or scientific problems.
Provides a language for thinking about shrinkage estimators and why we want to use them and how to specify the shrinkage.
Bayesian statistics permits discussion of the sampling density of the data given the unknown parameters.
Unfortunately this is all that frequentist statistics allows you to talk about.
Additionally: Bayesians can discuss the distribution of the data unconditional on the parameters.
Bayesian statistics also allows you to discuss the distribution of the parameters.
You may discuss the distribution of the parameters given the data. This is called the posterior, and is the conclusion of a Bayesian analysis.
You can talk about problems that classical statistics can't handle: The probability of nuclear war for example.

Data can come from books, journal articles, older lab data, previous studies, people, experts, the horse's mouth, rats a** or it may have been collected in the traditional form of data.
It isn't automatic, but there is language to think about how to do this pooling.

Bayesian inference is via laws of probability, not by some ad hoc procedure that you need to invent for every problem or validate every time you use it.
Don't need to figure out an estimator.
Once you have a model and data set, the conclusion is a computing problem, not a research problem.
Don't need to prove a theorem to show that your posterior is sensible. It is sensible if your assumptions are sensible.
Don't need to publish a bunch of papers to figure out sensible answers given a novel problem
For example, estimating a series of means $mu_1, mu_2, \ldots$ that you know are ordered $mu_j \le mu_{j+1}$ is a computing problem in Bayesian inference, but was the source of numerous papers in the frequentist literature. Finding a (good) frequentist estimator and finding standard errors and confidence intervals took lots of papers to figure out.

Can incorporate utility functions, if you have one.
Odd bits of other information can be incorporated into the analysis, for example

That a particular parameter, usually allowed to be positive or negative, must be positive.
That a particular parameter is probably positive, but not guaranteed to be positive.
That a given regression coefficient should be close to zero.
That group one's mean is larger than group two's mean.
That the data comes from a distribution that is not a Poisson, Binomial, Exponential or Normal. For example, the data may be better modeled by a t, gamma.
That a collection of parameters come from a distribution that is skewed, or has long tails.
Bayesian nonparametrics can allow you to model an unknown density as a non-parametric mixture of normals (or other density). The uncertainty in estimating this distribution is incorporated in making inferences about group means and regression coefficients.

You can calculate the probability that your hypothesis is true.
Bayesian modeling asks if this model describes the data, mother nature, the data generating process correctly, or sufficiently correctly.
Classical inference is all about the statistician and the algorithm, not the science.
In repeated samples, how often (or how accurately) does this algorithm/method/model/inference scheme give the right answer?
Classical inference is more about the robustness (in repeated sampling) of the procedure. In that way, it provides robustness results for Bayesian methods.

The bottom line: More tools. Faster progress.

Filed Under