How to Prepare for a PhD in Biostatistics

How would you advise an undergraduate interested in a PhD to prepare for studying biostats?

More math. You can't be too rich or know too much math. In terms of courses, take more mathematics, take as much as you can. When you get into our biostatistics graduate program, we teach you statistics, so taking more statistics now won't help you in the long run, plus we may have to un-teach something if you learn statistics badly.

In terms of what math courses to take, try to take real analysis. Advanced linear algebra is very helpful. Every part of math is useful somewhere in statistics, though connections may be obscure, or more likely, just not part of some current fad. Numerical analysis and combinatorics are also helpful, and everything else is helpful too. First though is Real Analysis and Advanced linear algebra.

What might I read outside of my courses?

Start picking up books and articles that relate to statistics, math, science, and public health and read them. Lately there have been a number of excellent popular science books that relate to science, statistics and statistical thinking. Anything and everything by Malcolm Gladwell I highly recommend. Other books that come to mind are things like Stephen Jay Gould's essays; The Signal and the Noise by Nate Silver; The Theory That Would Not Die by McGrayne; any of the books by Jared Diamond. A very important book for most anyone: Thinking, Fast and Slow by Kahneman. The Black Swan by Nassim Taleb; How to Lie With Statistics by Darrell Huff.

For someone not yet expert in statistics, books on statistical graphics are directly statistical and will be much more accessible than a technical book. Books on statistical graphics will directly make you a better statistician, now. These teach you both to look at data and how to look at data. There's a set of 4 books by Edward Tufte. See http://www.edwardtufte.com/tufte/ . Get all four, I recommend hardcover over paperback, and definitely I wouldn't get the e-books. Read all four! These are not always practical books, but they inspire us to do our best and to be creative in our statistical and graphical analyses.

Read Visualizing Data by William Cleveland (A+ wonderful book). Additional graphics books include Graphical Methods for Data Analysis by Chambers Cleveland Kleiner and Tukey if you can find a copy. A more statistical book that will help instill the proper attitude about data is Exploratory Data Analysis by Tukey.

Read anything else you can find to read. Read widely and diversely. As you get stronger in math and statistics, change the level of the books. Start exploring the literature. Dive into one area and read as much as you can. Then find another area and check it out.

Can I depend on the department to teach me everything I need to be a good statistician?

Of course not. Active learning is paramount. No graduate department will teach you everything. All departments teach a core set of material, and it is up to students to supplement that core with additional material. How you supplement that core determines what kind of statistician you can be, how far you can go. Some people might supplement their core material with an in depth study of non-parametrics; others with Bayesian methods, statistical computing, spatial data analysis or clinical trials. I supplemented my graduate education with statistical graphics, Bayesian methods, statistical computing, regression methods, hierarchical models, semi-parametric modeling, foundations and longitudinal data analysis. The semi-parametric modeling, graphics and computing mostly came from books. The longitudinal data analysis came from a mix of books and journal articles. Bayesian methods and hierarchical models I learned mostly from journal articles. Foundations came from talking to people and listening to seminars, as well as from journal articles and books. I also tried to learn additional mathematical statistics using various texts, but wasn't very successful; similarly with optimal design.

How you supplement your education depends on your interests and may help you refine your interests. I found I wasn't that interested in time series, survey sampling, stochastic processes or optimal design. If you're interested in working with a particular professor, you're going to need to supplement with books in her/his area, and you're going to need to read that professor's research papers to see what you're going to be getting into.

What programming language(s) should I learn?

R is growing fast and may take over, sort of like kudzu, so it is well worth your time to become expert in it. Definitely learn/use R Studio. Some folks make a living just off their R expertise. A lot more make a living off their SAS expertise. But I bet the R people are having a lot more fun. The rest of this is what I garner from others, not from direct knowledge. If you want less of a statistics specialty language and to be closer to the computer end of things, C++ or JAVA are extremely popular (you should figure out why). Python seems to be coming on very strong. So maybe R and Python? Depends on what you like. Learn something about algorithms and something about modern computer programming interfaces. And a little HTML.

Go learn latex now. Become at least a partial expert. Knowing latex before you come in to grad school is very helpful.

Some Questions from an Undergraduate for a Biostatistics Graduate Admissions Chair

These are questions I answered recently for an undergrad who had questions about how best to prepare herself for graduate school. I thought the answers might be more widely of interest, and with a lot of editing, am including the questions and answers here. Disclaimer: Your mileage may vary. A lot. 

What do graduate programs look for in an applicant? How does admissions work? 

Different programs differ immensely. I imagine that what got me into my graduate program only got me in because I happened to be taking a course from the department chair the quarter I applied to graduate school. My own grad school application would not get me into UCLA Biostat PhD program at this time, though it might get me into our MS program. [There's a pretty harsh moral in there somewhere.] Biostat programs often have less math requirements than do statistics program, but they also may or may not teach as much math stat as a stat program. Ours does teach plenty of math stat. A lot of programs (stat and biostat alike) admit to the MS, then pass you through to the PhD program if you do well. We admit a few people to the PhD, many more to the MS, and we admit to the PhD from our own MS. I believe this is how many programs work. (Another model is that some programs admit only PhD students, but if people don't make it through the PhD, they are sent off with an MS degree as a consolation prize.) There are no doubt other models.    

Is it more important that I take particular mathematics courses before sending in my application or get good grades in the ones that I do take?

I'd vote for good grades. A good grade when you've only taken one or two (upper division) math courses is an 'A'. If you're going for a PhD you'll need to show that you can do PhD work though, and that means taking a few difficult math courses. Undergrads usually don't take what I would call "really difficult statistics courses".  But that is certainly university/program specific. If you take a lot of math classes, then you have enough to average out the occasional bad grade. But don't ask me to tell you how to balance out the occasional bad grade (i.e. a B) more mathematics courses.  

What do you look for in a graduate student?

You need to put in many hours (Confer Outliers, by Malcolm Gladwell) to get really good at something. So putting in the time is worth while, starting now. For example, no graduate program teaches you everything about a subject; if you're going to have a well rounded education after receiving your PhD, you're going to have to teach yourself more than you learn in class. How do you teach yourself material? Start now, trial and error, learn stuff, do badly on somethings, better on others, and most importantly, keep on trying. So that's what I really want in a doctoral student: curiosity, an ability to figure stuff out, an inquiring mind, stick-to-it-tiveness, someone interesting we'd really like to have around for several years. Sadly that's not what is tested for on the GRE. 

Do you, as an admissions officer, look at the courses applicants are "currently enrolled in" even if there is not a grade attached to that course?

Yes, absolutely. But mainly when it matters to my admissions decision. Here's one hypothetical example: Consider someone with all B's in their first 2 years of college who suddenly 'finds themself' and gets all A's their junior year; if they are a math major, I really want to know about those Fall grades. If they are all A's that person could be considered for the PhD program. If they are a mix of B's and some A's, maybe I'm willing to admit to the MS program only. Someone with all A's in their first three years of undergrad (yes, they exist), I probably don't need to see senior year Fall grades to make an admissions decision.

Here's another common hypothetical: someone who qualifies for one degree program, but is enrolled in courses, that, if she gets A's in them, would qualify her for a different degree program. And suppose she prefers that 'different degree program'. Then I'm looking at the courses, and I actually need to wait to see the grades before I can make a sensible admissions decision. 

What math courses specifically do you look for in a student's application/ suggest I take before I apply?

For the MS program you need to take UCLA's Math 31AB 32AB and 33AB sequence. [Those of you elsewhere can look up what those courses are and find the matching courses at your own university). Next is any/every junior/senior level math classes. Once people are enrolled in our MS program, but supposing they are interested in the PhD program, there isn't lots of time to take many math courses, so we basically only recommend that they take real analysis 131a (and 131b) and linear algebra (115a and sometimes 115b). I guess those qualify as the courses to take if you can only manage a small number of courses. But a lot of junior/senior level math courses can be useful, depending on what sort of statistics one ends up in. Also, real analysis can be a real bear of a course, and it may be helpful to ease into by taking something else junior/senior level first. For direct admission to the PhD program, more math (with good grades, natch) is always better. Remember: You can never be too think, too rich or know too much math. 
Subscribe to undergrads