Why Be Bayesian? Let Me Count the Ways
In answer to an old friend's question.
- Bayesians have more fun.
- Our conferences are in better places too.
- It's the model not the estimator.
- Life's too short to be a frequentist: In an infinite number of replications ...
- Software works better.
- Rather surprisingly, Bayesian software is a lot more general than frequentist software.
- Small sample inference comes standard with most Bayesian model fitting these days.
- But if you like your inference asymptotic, that's available, just not high on anyone's priority list.
- We can handle the no-data problem, all the way up to very large problems.
- Don't need a large enough sample to allow for a bootstrap.
- Hierarchical random effects models are better fit with Bayesian models and software.
- If a variance component is small, the natural Bayes model doesn't allow zero as an estimate, while the natural maximum likelihood algorithms do allow zero. If you get a zero estimate, then you're going to get poor estimates of standard errors of fixed effects. [More discussion omitted.]
- Can handle problems where there are more parameters than data.
- Logistic regression models fit better with Bayes
- If there's perfect separation on a particular variable, the maximum likelihood estimate of the coefficient is plus or minus infinity which isn't a good estimate.
- Bayesian modeling offers (doesn't guarantee it, there's no insurance against stupidity) the opportunity to do the estimation correctly.
- Same thing if you're trying to estimate a very tiny (or very large) probability. Suppose you observe 20 out of 20 successes on something that you know doesn't have 100% successes.
- To rephrase a bit: In small samples or with rare events, Bayesian estimates shrink towards sensible point estimates, (if your prior is sensible) thus avoiding the large variance of point estimates.
- Variance bias trade-off is working in your favor.
- Frequentists keep reinventing Bayesian methods
- Shrinkage estimates
- Empirical Bayes
- Lasso
- Penalized likelihood
- Ridge regression
- James-Stein estimators
- Regularization
- Pittman estimation
- Integrated likelihood
- In other words, it's just not possible to analyze complex data structures without Bayesian ideas.
- Your answers are admissible if you're Bayesian but usually not if you're a frequentist.
- Admissibility means never having to say you're sorry.
- Alternatively, admissibility means that someone else can't prove that they can do a better job than you.
- And if you're a frequentist, someone is clogging our journals with proofs that the latest idiocy is admissible or not.
- Unless they are clogging it with yet more ways to estimate the smoothing parameter for a nonparametric estimator.
- Bayesian models are generalizations of classical models. That's what the prior buys you: more models
- Can handle discrete, categorical, ordered categorical, trees, densities, matrices, missing data and other odd parameter types.
- Data and parameters are treated on an equal playing field.
- I would argue that cross-validation works because it approximates Bayesian model selection tools.
- Bayesian Hypothesis Testing
- Treats the null and alternative hypotheses on equal terms
- Can handle two or more than two hypotheses
- Can handle hypotheses that are
- Disjoint
- Nested
- Overlapping but neither disjoint nor nested
- Gives you the probability the alternative hypothesis is true.
- Classical inference can only handle the nested null hypothesis problem.
- We're all probably misusing p-values anyway.
- Provides a language for talking about modeling and uncertainty that is missing in classical statistics.
- And thus provides a language for developing new models for new data sets or scientific problems.
- Provides a language for thinking about shrinkage estimators and why we want to use them and how to specify the shrinkage.
- Bayesian statistics permits discussion of the sampling density of the data given the unknown parameters.
- Unfortunately this is all that frequentist statistics allows you to talk about.
- Additionally: Bayesians can discuss the distribution of the data unconditional on the parameters.
- Bayesian statistics also allows you to discuss the distribution of the parameters.
- You may discuss the distribution of the parameters given the data. This is called the posterior, and is the conclusion of a Bayesian analysis.
- You can talk about problems that classical statistics can't handle: The probability of nuclear war for example.
- Novel computing tools -- but you can often use your old tools as well.
- Bayesian methods allow pooling of information from diverse data sources.
- Data can come from books, journal articles, older lab data, previous studies, people, experts, the horse's mouth, rats a** or it may have been collected in the traditional form of data.
- It isn't automatic, but there is language to think about how to do this pooling.
- Less work.
- Bayesian inference is via laws of probability, not by some ad hoc procedure that you need to invent for every problem or validate every time you use it.
- Don't need to figure out an estimator.
- Once you have a model and data set, the conclusion is a computing problem, not a research problem.
- Don't need to prove a theorem to show that your posterior is sensible. It is sensible if your assumptions are sensible.
- Don't need to publish a bunch of papers to figure out sensible answers given a novel problem
- For example, estimating a series of means $mu_1, mu_2, \ldots$ that you know are ordered $mu_j \le mu_{j+1}$ is a computing problem in Bayesian inference, but was the source of numerous papers in the frequentist literature. Finding a (good) frequentist estimator and finding standard errors and confidence intervals took lots of papers to figure out.
- Yes, you can still use SAS.
- Or R or Stata.
- Can incorporate utility functions, if you have one.
- Odd bits of other information can be incorporated into the analysis, for example
- That a particular parameter, usually allowed to be positive or negative, must be positive.
- That a particular parameter is probably positive, but not guaranteed to be positive.
- That a given regression coefficient should be close to zero.
- That group one's mean is larger than group two's mean.
- That the data comes from a distribution that is not a Poisson, Binomial, Exponential or Normal. For example, the data may be better modeled by a t, gamma.
- That a collection of parameters come from a distribution that is skewed, or has long tails.
- Bayesian nonparametrics can allow you to model an unknown density as a non-parametric mixture of normals (or other density). The uncertainty in estimating this distribution is incorporated in making inferences about group means and regression coefficients.
- Bayesian modeling is about the science.
- You can calculate the probability that your hypothesis is true.
- Bayesian modeling asks if this model describes the data, mother nature, the data generating process correctly, or sufficiently correctly.
- Classical inference is all about the statistician and the algorithm, not the science.
- In repeated samples, how often (or how accurately) does this algorithm/method/model/inference scheme give the right answer?
- Classical inference is more about the robustness (in repeated sampling) of the procedure. In that way, it provides robustness results for Bayesian methods.
- Bayesian methods have had notable successes, to wit:
- Covariate selection in regression problems
- Model selection
- Model mixing
- And mixture models
- Missing data
- Multi-level and hierarchical models
- Phylogeny
The bottom line: More tools. Faster progress.
Filed Under