How to Prepare for a PhD in Biostatistics

How would you advise an undergraduate interested in a PhD to prepare for studying biostats?

More math. You can't be too rich or know too much math. In terms of courses, take more mathematics, take as much as you can. When you get into our biostatistics graduate program, we teach you statistics, so taking more statistics now won't help you in the long run, plus we may have to un-teach something if you learn statistics badly.

In terms of what math courses to take, try to take real analysis. Advanced linear algebra is very helpful. Every part of math is useful somewhere in statistics, though connections may be obscure, or more likely, just not part of some current fad. Numerical analysis and combinatorics are also helpful, and everything else is helpful too. First though is Real Analysis and Advanced linear algebra.

What might I read outside of my courses?

Start picking up books and articles that relate to statistics, math, science, and public health and read them. Lately there have been a number of excellent popular science books that relate to science, statistics and statistical thinking. Anything and everything by Malcolm Gladwell I highly recommend. Other books that come to mind are things like Stephen Jay Gould's essays; The Signal and the Noise by Nate Silver; The Theory That Would Not Die by McGrayne; any of the books by Jared Diamond. A very important book for most anyone: Thinking, Fast and Slow by Kahneman. The Black Swan by Nassim Taleb; How to Lie With Statistics by Darrell Huff.

For someone not yet expert in statistics, books on statistical graphics are directly statistical and will be much more accessible than a technical book. Books on statistical graphics will directly make you a better statistician, now. These teach you both to look at data and how to look at data. There's a set of 4 books by Edward Tufte. See http://www.edwardtufte.com/tufte/ . Get all four, I recommend hardcover over paperback, and definitely I wouldn't get the e-books. Read all four! These are not always practical books, but they inspire us to do our best and to be creative in our statistical and graphical analyses.

Read Visualizing Data by William Cleveland (A+ wonderful book). Additional graphics books include Graphical Methods for Data Analysis by Chambers Cleveland Kleiner and Tukey if you can find a copy. A more statistical book that will help instill the proper attitude about data is Exploratory Data Analysis by Tukey.

Read anything else you can find to read. Read widely and diversely. As you get stronger in math and statistics, change the level of the books. Start exploring the literature. Dive into one area and read as much as you can. Then find another area and check it out.

Can I depend on the department to teach me everything I need to be a good statistician?

Of course not. Active learning is paramount. No graduate department will teach you everything. All departments teach a core set of material, and it is up to students to supplement that core with additional material. How you supplement that core determines what kind of statistician you can be, how far you can go. Some people might supplement their core material with an in depth study of non-parametrics; others with Bayesian methods, statistical computing, spatial data analysis or clinical trials. I supplemented my graduate education with statistical graphics, Bayesian methods, statistical computing, regression methods, hierarchical models, semi-parametric modeling, foundations and longitudinal data analysis. The semi-parametric modeling, graphics and computing mostly came from books. The longitudinal data analysis came from a mix of books and journal articles. Bayesian methods and hierarchical models I learned mostly from journal articles. Foundations came from talking to people and listening to seminars, as well as from journal articles and books. I also tried to learn additional mathematical statistics using various texts, but wasn't very successful; similarly with optimal design.

How you supplement your education depends on your interests and may help you refine your interests. I found I wasn't that interested in time series, survey sampling, stochastic processes or optimal design. If you're interested in working with a particular professor, you're going to need to supplement with books in her/his area, and you're going to need to read that professor's research papers to see what you're going to be getting into.

What programming language(s) should I learn?

R is growing fast and may take over, sort of like kudzu, so it is well worth your time to become expert in it. Definitely learn/use R Studio. Some folks make a living just off their R expertise. A lot more make a living off their SAS expertise. But I bet the R people are having a lot more fun. The rest of this is what I garner from others, not from direct knowledge. If you want less of a statistics specialty language and to be closer to the computer end of things, C++ or JAVA are extremely popular (you should figure out why). Python seems to be coming on very strong. So maybe R and Python? Depends on what you like. Learn something about algorithms and something about modern computer programming interfaces. And a little HTML.

Go learn latex now. Become at least a partial expert. Knowing latex before you come in to grad school is very helpful.

Subscribe to learning