The Purpose of Statistics | Rob Weiss

This is technical, but I hope you will be able to follow it by the time you get your doctoral degree from UCLA Biostat. It was written shortly after I arrived at UCLA and was attempting to reconcile what I had learned in graduate school about the purpose of statistics with the activities other statisticians seemed to be engaged in. Now I realize that there are many other purposes to statistics and statistical analysis.

The fundamental goal of statistical analysis is to form an approximation to the sampling density of a response Y given covariates X, which we write f(Y|X). This is called modeling. The problems of estimation and testing are problems of summarizing the density f(Y|X) and are not and, I think, should not be the primary centerpiece of statistical inference.

The primary emphasis of much statistical effort has traditionally been towards proofs that various estimators converge asymptotically. Another major effort of statisticians is into the asymptotic sampling distribution of these estimators. This is unfortunate, because it is not the primary problem of statistical data analysis, even if it does have some independent mathematical interest.

Consider the most celebrated papers and tools of our discipline. These are generally papers that develop new statistical models and computing techniques. A few important papers are Box and Cox (1964); D.R. Cox's contributions to logistic regression and other statistical models; Nelder and Wedderburn (1972); Laird and Ware (1982). Important milestones are the development of statistical computing systems like SAS and S for analyzing data. Compare the attention Cox's papers on modeling get in contrast to his papers on likelihood inference. Perhaps the exception to this general rule is the attention paid to Efron (1979), the first bootstrap paper; the bootstrap seems to rank in importance along with Fisher's contributions to taking the second derivative of the log likelihood. Another important recent development in statistical computing is the introduction of Markov Chain Monte Carlo methods for Bayesian inference.

Essentially what most modeling papers do is to set up a family of models f(Y|theta, X) and then propose methods to estimate theta. By estimating theta, one gets a plug-in estimate of f(Y|X) = f(Y | theta-hat, X). What these papers are doing is proposing a model f(Y|theta,X) to match up with the known prior information about a particular problem. Unfortunately, classical statistics doesn't like to make this discussion explicit, so we have not developed language and tools to make the model specification problem easier. Bayesian inference may have these tools, but they are in vestigial form at the moment, and have not been developed.