Open and Closed Intervals: A Problem for ML Inference But Not Bayes

Does maximum likelihood inference have a support problem? 

Maximum likelihood (ML) has a problem with parameters that take values in open sets (Is that all of them? Almost!). Bayesian inference doesn't obviously have this problem. 

Briefly, using maximum likelihood, how many of you out there know how to put a standard deviation/standard error or a confidence interval on a probability when you get a result of y = 0 out of n = 20 independent Bernoulli(\pi) trials? How about if you get a variance estimate of \tau-hat=zero in a random intercept model for longitudinal data? Selfcheck: I know how to set a CI for \pi, I did it once on a test too, fortunately a take home test. But I don't know how to do this for the variance parameter. Maybe I could figure it out, inverting a likelihood ratio test or something. And I haven't a clue how to construct a SE in these situations. Even if we can solve the question for \pi and \tau, the serious problem is constructing a confidence interval for \pi_1 - \pi_2, the difference in probability of success in treatment minus control groups. For the random intercept model, the problem is comparing \mu_2, the treatment mean to control mean \mu_1, where we've randomized groups to treatment or control, and we need to estimate the intraclass correlation coefficient to properly estimate the SE of the estimated mean difference. 

But these difficulties are not problems I ever have using Bayesian inference. Never. Not once. Data yes: I've had y=0 out of n=20 walk in the door as part of a short term consulting problem. And variance estimates of zero seem to happen every second analysis in group randomized trials when we are using SAS or similar software. We spend a ton of time trying to fix the resulting inferences. But as a Bayesian, I spend my time on modeling data, not on fixing the problems with ML inference.

In Bayesian inference, we put a prior density on an open set such as (0,1) for a probability or the positive real line for a variance. Poster densities live on the same open set -- Bayesian inference works fine. For ML inference, estimates live on the closed set: probabilities may be estimated in the closed set [0,1] and variances are estimated on zero plus the positive real line. The ML paradigm creates estimates and then uses those estimates in further needed calculations, such as for standard errors and for confidence intervals. ML estimates on the boundary can't be used in the usual SE or CI formulas, they do not come with natural standard error estimates. This causes enormous headaches for ML inference, though it appears to be a blessing for statisticians who need tenure as it gives them plenty of grist for writing papers about cones and all sorts of cool (or obscure depending on your viewpoint) mathematics. 

It's a matter of support. Bayesian inference you have to specify the support of the prior density. In classical statistics, getting the support right is usually a matter of finding out your approach is giving silly answers and you need to fix the approach. These fixes are ad hoc and require lots of papers to be published, creating an inflated working class of (inflated) tenured frequentist statisticians. :-) As the silliness gets more subtle, more work goes into the fixes than in rejecting the approach in the first place. ​Go back in time to yesteryear when the method of moments was popular. A serious problem occurred in variance component models giving negative estimates of variance. Negative estimates of variance! "But," sputters the frequentist statistician, "we don't do that any more." Fine, but you've still get a support problem. 

ML statisticians should start wearing support hose to help support their support problem. 

Guns are the Tools that People Use Most Often to Kill People

Intro. A good friend has a habit of re-posting on facebook annoying and silly claims about guns. No, I don't want to de-friend him. Mostly I ban the source of the claims but he finds new sources of material. When I attempt to verify some claims, the claims are easily refuted by a quick google search and the first link to Wikipedia.

The most specious of gun-related claims is the 'guns don't kill people, people kill people'. If you want to play it that way, fine. Cars don't drive people, people drive people! Scissors don't cut paper, people cut paper! Knives don't chop vegetables, people chop vegetables! A gun is a tool, as are cars, scissors and knives. I use my car most days to transport me around town, I use scissors perhaps weekly to open mail or cut sheets of paper to needed size; I use knives for daily preparation and consumption of food. Cars, scissors and knives are tools that occasionally get used to kill people. Still, soccer moms/dads and taxi drivers don't kill many folks intentionally with their cars, second graders use scissors for art projects without managing to kill anyone, and chefs and cooks and food-eaters of all kinds use knives without stopping someone's beating heart. 

Data. Being a statistician, what I did do is get data from Wikipedia on numbers of murders and population broken out by state. Wikipedia got its data from the FBI Uniform Crime Reporting Statistics. The FBI for 2010 seemed to be missing data from Florida, but Florida publishes the data for us, and the Florida and FBI sources match the data in the Wikipedia article. That particular Wikipedia article also gave 2010 state population. I subtracted gun murders from total murders to give non-gun murders by state and calculated gun and non-gun murder rates.

Results. The first figure plots numbers of murders (gun and non-gun separately) in a state against state population on a log-log plot with lowess curves (iter=0) for both murder types. Gun murder counts are generally greater than murder counts using all other tools combined. For most of the state population range, the lowess curves are linear and parallel, suggesting that the number of murders of either type goes up as a power of the population. Fitting a simple GLM suggests the power is about 1.1 for both gun and non-gun murders, only slightly larger than 1. 

Only for the smallest population states do the two lowess curves intersect. Checking the data, eight, mostly northerly, mostly smaller states have non-gun murder counts higher than their gun-murder counts; these are Hawaii, Maine, New Hampshire, North Dakota, Oregon, Utah, Vermont and West Virginia. The total number of murders with all tools in those 8 states was 262 with 34 more non-gun murders than gun murders.  In contrast, there were 13540 murders in the other 42 states. 

We can also look at gun and non-gun murder rates by state as well; these are given in the second figure. Non-gun murder rates barely increase with population size, while gun murder rates do increase with population size, until somewhere just after 5 million people when the rate of growth tapers off. 

Conclusion. For 2010 there were 9,304 gun murders, and 4,498 non-gun murders in the 50 United States. In the United States in 2010, the gun is the primary tool people use for killing people. People use guns to kill people more than twice as often as they use all other tools combined. People use guns to kill people. 

 

 

 

More on Writing, Guest Post

This from Dr. Robert Bolan of the LAGLC. 

I agree with Rob’s choices of writing references. Strunk & White and Zinnser are indispensable and, perhaps not so surprisingly, they are written well enough so they actually can be read and not only used as quick lookup sources. Of course there are others but these are touchstones of proper English grammar and word usage.

So much of good writing, as Rob suggests, is trying to achieve absolute clarity with the words you choose and how you string them together. Economy is a sacred principle in good writing. Use the right words and use as few as possible. Also, rearrange sentences to get the flow right. For guidance on these skills I like Getting the Words Right: How to Revise, Edit & Rewrite by Theodore A. Rees Cheney. For assistance in technical writing there are several references. I like Merriam-Webster’s Manual for Writers & Editors. For sheer brilliance and clarity of advice, check out Robertson Davies’ Reading and Writing, a slim volume you can read in two or three sessions (read slowly, let sink in, do not gulp this one down). And finally, I offer my fervent belief that scientific writing, although requiring parsimony and precision, need not be dry and devoid of style. Read anything by John Gardner on writing, Stephen King on writing, Eudora Welty on writing, or any novelist or essayist whose style you admire. And then when you’re done with all that, read Paradise Lost—aloud—not for comprehension but for the sheer thunderous music of it. Philip Pullman, who wrote the introduction to my edition of Milton’s masterpiece, remarked that "the experience of reading poetry aloud when you don’t fully understand it is a curious and complicated one. It’s like suddenly discovering you can play the organ." You will likely be thinking that poetry has nothing to do with scientific writing. I disagree. All writing exists for the purpose of communication. Again, scientific writing need not be sterile—although that often appears to be the gold standard for editors. If you have something important to say, you must say it clearly, of course. But cadence and musicality, sparingly used, can deliver your meaning with an elegance that will, unbeknownst to the reader, nestle it into place with crystal clarity. Compose, don’t just write.

Obsessive attention to nuance and detail in writing can be a curse as well as a virtue, and every true writer can identify with the following. A friend of Oscar Wilde’s is reported to have asked him what he did yesterday.  Wilde replied: "In the morning I took out a comma. In the afternoon I put it back in again."

Me again regarding this last: A wonderful short essay on being too critical of yourself early in the writing process is Gail Godwin's The Watcher at the Gate. 

Some Questions from an Undergraduate for a Biostatistics Graduate Admissions Chair

These are questions I answered recently for an undergrad who had questions about how best to prepare herself for graduate school. I thought the answers might be more widely of interest, and with a lot of editing, am including the questions and answers here. Disclaimer: Your mileage may vary. A lot. 

What do graduate programs look for in an applicant? How does admissions work? 

Different programs differ immensely. I imagine that what got me into my graduate program only got me in because I happened to be taking a course from the department chair the quarter I applied to graduate school. My own grad school application would not get me into UCLA Biostat PhD program at this time, though it might get me into our MS program. [There's a pretty harsh moral in there somewhere.] Biostat programs often have less math requirements than do statistics program, but they also may or may not teach as much math stat as a stat program. Ours does teach plenty of math stat. A lot of programs (stat and biostat alike) admit to the MS, then pass you through to the PhD program if you do well. We admit a few people to the PhD, many more to the MS, and we admit to the PhD from our own MS. I believe this is how many programs work. (Another model is that some programs admit only PhD students, but if people don't make it through the PhD, they are sent off with an MS degree as a consolation prize.) There are no doubt other models.    

Is it more important that I take particular mathematics courses before sending in my application or get good grades in the ones that I do take?

I'd vote for good grades. A good grade when you've only taken one or two (upper division) math courses is an 'A'. If you're going for a PhD you'll need to show that you can do PhD work though, and that means taking a few difficult math courses. Undergrads usually don't take what I would call "really difficult statistics courses".  But that is certainly university/program specific. If you take a lot of math classes, then you have enough to average out the occasional bad grade. But don't ask me to tell you how to balance out the occasional bad grade (i.e. a B) more mathematics courses.  

What do you look for in a graduate student?

You need to put in many hours (Confer Outliers, by Malcolm Gladwell) to get really good at something. So putting in the time is worth while, starting now. For example, no graduate program teaches you everything about a subject; if you're going to have a well rounded education after receiving your PhD, you're going to have to teach yourself more than you learn in class. How do you teach yourself material? Start now, trial and error, learn stuff, do badly on somethings, better on others, and most importantly, keep on trying. So that's what I really want in a doctoral student: curiosity, an ability to figure stuff out, an inquiring mind, stick-to-it-tiveness, someone interesting we'd really like to have around for several years. Sadly that's not what is tested for on the GRE. 

Do you, as an admissions officer, look at the courses applicants are "currently enrolled in" even if there is not a grade attached to that course?

Yes, absolutely. But mainly when it matters to my admissions decision. Here's one hypothetical example: Consider someone with all B's in their first 2 years of college who suddenly 'finds themself' and gets all A's their junior year; if they are a math major, I really want to know about those Fall grades. If they are all A's that person could be considered for the PhD program. If they are a mix of B's and some A's, maybe I'm willing to admit to the MS program only. Someone with all A's in their first three years of undergrad (yes, they exist), I probably don't need to see senior year Fall grades to make an admissions decision.

Here's another common hypothetical: someone who qualifies for one degree program, but is enrolled in courses, that, if she gets A's in them, would qualify her for a different degree program. And suppose she prefers that 'different degree program'. Then I'm looking at the courses, and I actually need to wait to see the grades before I can make a sensible admissions decision. 

What math courses specifically do you look for in a student's application/ suggest I take before I apply?

For the MS program you need to take UCLA's Math 31AB 32AB and 33AB sequence. [Those of you elsewhere can look up what those courses are and find the matching courses at your own university). Next is any/every junior/senior level math classes. Once people are enrolled in our MS program, but supposing they are interested in the PhD program, there isn't lots of time to take many math courses, so we basically only recommend that they take real analysis 131a (and 131b) and linear algebra (115a and sometimes 115b). I guess those qualify as the courses to take if you can only manage a small number of courses. But a lot of junior/senior level math courses can be useful, depending on what sort of statistics one ends up in. Also, real analysis can be a real bear of a course, and it may be helpful to ease into by taking something else junior/senior level first. For direct admission to the PhD program, more math (with good grades, natch) is always better. Remember: You can never be too think, too rich or know too much math. 

Where Have All the Tenured Women Gone?

Ingram Olkin has an excellent editorial about gender equality in statistics departments in the US.  Everyone should read it. 

Each University has its own business structure, and UCLA has its own structure. Biostatistics also differs from statistics in a lot of important respects, particularly with regard to soft and hard money. I looked at UCLA Biostat based on information from our web site.  Counting everyone listed as Professor in our directory, we had 2 Female out of 17 Full Professors, 4/4 Associate Professors, and 3/5 Assistant Professors. These numbers include joint and secondary appointments, and part time and non-tenured professors. Doing my best to count faculty with tenure in Biostat, I get 1/8 Full, 2/2 Associate and 0/1 Assistant Professors. It may sound funny to hedge ("Doing my best"), but I really didn't know how many people we had until I counted just now. For the tenured count, I again took my best guess. (Mistakes may get made!) One full male prof is actually split between two departments, so saying we have 1 female out of 7.5 tenured full professors might be slightly more accurate. 

UCLA Biostat may not be doing too bad at the junior ranks in terms of gender diversification. Numbers are obviously not the whole story; atmosphere, support, opportunities also matter immensely. Senior faculty ranks are clearly lopsidedly male.

We all have to continue to actively support gender equality in hiring and promotion. Semper Vigilo