Open and Closed Intervals: A Problem for ML Inference But Not Bayes
Does maximum likelihood inference have a support problem?
Maximum likelihood (ML) has a problem with parameters that take values in open sets (Is that all of them? Almost!). Bayesian inference doesn't obviously have this problem.
Briefly, using maximum likelihood, how many of you out there know how to put a standard deviation/standard error or a confidence interval on a probability when you get a result of y = 0 out of n = 20 independent Bernoulli(\pi) trials? How about if you get a variance estimate of \tau-hat=zero in a random intercept model for longitudinal data? Selfcheck: I know how to set a CI for \pi, I did it once on a test too, fortunately a take home test. But I don't know how to do this for the variance parameter. Maybe I could figure it out, inverting a likelihood ratio test or something. And I haven't a clue how to construct a SE in these situations. Even if we can solve the question for \pi and \tau, the serious problem is constructing a confidence interval for \pi_1 - \pi_2, the difference in probability of success in treatment minus control groups. For the random intercept model, the problem is comparing \mu_2, the treatment mean to control mean \mu_1, where we've randomized groups to treatment or control, and we need to estimate the intraclass correlation coefficient to properly estimate the SE of the estimated mean difference.
But these difficulties are not problems I ever have using Bayesian inference. Never. Not once. Data yes: I've had y=0 out of n=20 walk in the door as part of a short term consulting problem. And variance estimates of zero seem to happen every second analysis in group randomized trials when we are using SAS or similar software. We spend a ton of time trying to fix the resulting inferences. But as a Bayesian, I spend my time on modeling data, not on fixing the problems with ML inference.
In Bayesian inference, we put a prior density on an open set such as (0,1) for a probability or the positive real line for a variance. Poster densities live on the same open set -- Bayesian inference works fine. For ML inference, estimates live on the closed set: probabilities may be estimated in the closed set [0,1] and variances are estimated on zero plus the positive real line. The ML paradigm creates estimates and then uses those estimates in further needed calculations, such as for standard errors and for confidence intervals. ML estimates on the boundary can't be used in the usual SE or CI formulas, they do not come with natural standard error estimates. This causes enormous headaches for ML inference, though it appears to be a blessing for statisticians who need tenure as it gives them plenty of grist for writing papers about cones and all sorts of cool (or obscure depending on your viewpoint) mathematics.
It's a matter of support. Bayesian inference you have to specify the support of the prior density. In classical statistics, getting the support right is usually a matter of finding out your approach is giving silly answers and you need to fix the approach. These fixes are ad hoc and require lots of papers to be published, creating an inflated working class of (inflated) tenured frequentist statisticians. :-) As the silliness gets more subtle, more work goes into the fixes than in rejecting the approach in the first place. Go back in time to yesteryear when the method of moments was popular. A serious problem occurred in variance component models giving negative estimates of variance. Negative estimates of variance! "But," sputters the frequentist statistician, "we don't do that any more." Fine, but you've still get a support problem.
ML statisticians should start wearing support hose to help support their support problem.