How to Prepare for a PhD in Biostatistics

How would you advise an undergraduate interested in a PhD to prepare for studying biostats?

More math. You can't be too rich or know too much math. In terms of courses, take more mathematics, take as much as you can. When you get into our biostatistics graduate program, we teach you statistics, so taking more statistics now won't help you in the long run, plus we may have to un-teach something if you learn statistics badly.

In terms of what math courses to take, try to take real analysis. Advanced linear algebra is very helpful. Every part of math is useful somewhere in statistics, though connections may be obscure, or more likely, just not part of some current fad. Numerical analysis and combinatorics are also helpful, and everything else is helpful too. First though is Real Analysis and Advanced linear algebra.

What might I read outside of my courses?

Start picking up books and articles that relate to statistics, math, science, and public health and read them. Lately there have been a number of excellent popular science books that relate to science, statistics and statistical thinking. Anything and everything by Malcolm Gladwell I highly recommend. Other books that come to mind are things like Stephen Jay Gould's essays; The Signal and the Noise by Nate Silver; The Theory That Would Not Die by McGrayne; any of the books by Jared Diamond. A very important book for most anyone: Thinking, Fast and Slow by Kahneman. The Black Swan by Nassim Taleb; How to Lie With Statistics by Darrell Huff.

For someone not yet expert in statistics, books on statistical graphics are directly statistical and will be much more accessible than a technical book. Books on statistical graphics will directly make you a better statistician, now. These teach you both to look at data and how to look at data. There's a set of 4 books by Edward Tufte. See http://www.edwardtufte.com/tufte/ . Get all four, I recommend hardcover over paperback, and definitely I wouldn't get the e-books. Read all four! These are not always practical books, but they inspire us to do our best and to be creative in our statistical and graphical analyses.

Read Visualizing Data by William Cleveland (A+ wonderful book). Additional graphics books include Graphical Methods for Data Analysis by Chambers Cleveland Kleiner and Tukey if you can find a copy. A more statistical book that will help instill the proper attitude about data is Exploratory Data Analysis by Tukey.

Read anything else you can find to read. Read widely and diversely. As you get stronger in math and statistics, change the level of the books. Start exploring the literature. Dive into one area and read as much as you can. Then find another area and check it out.

Can I depend on the department to teach me everything I need to be a good statistician?

Of course not. Active learning is paramount. No graduate department will teach you everything. All departments teach a core set of material, and it is up to students to supplement that core with additional material. How you supplement that core determines what kind of statistician you can be, how far you can go. Some people might supplement their core material with an in depth study of non-parametrics; others with Bayesian methods, statistical computing, spatial data analysis or clinical trials. I supplemented my graduate education with statistical graphics, Bayesian methods, statistical computing, regression methods, hierarchical models, semi-parametric modeling, foundations and longitudinal data analysis. The semi-parametric modeling, graphics and computing mostly came from books. The longitudinal data analysis came from a mix of books and journal articles. Bayesian methods and hierarchical models I learned mostly from journal articles. Foundations came from talking to people and listening to seminars, as well as from journal articles and books. I also tried to learn additional mathematical statistics using various texts, but wasn't very successful; similarly with optimal design.

How you supplement your education depends on your interests and may help you refine your interests. I found I wasn't that interested in time series, survey sampling, stochastic processes or optimal design. If you're interested in working with a particular professor, you're going to need to supplement with books in her/his area, and you're going to need to read that professor's research papers to see what you're going to be getting into.

What programming language(s) should I learn?

R is growing fast and may take over, sort of like kudzu, so it is well worth your time to become expert in it. Definitely learn/use R Studio. Some folks make a living just off their R expertise. A lot more make a living off their SAS expertise. But I bet the R people are having a lot more fun. The rest of this is what I garner from others, not from direct knowledge. If you want less of a statistics specialty language and to be closer to the computer end of things, C++ or JAVA are extremely popular (you should figure out why). Python seems to be coming on very strong. So maybe R and Python? Depends on what you like. Learn something about algorithms and something about modern computer programming interfaces. And a little HTML.

Go learn latex now. Become at least a partial expert. Knowing latex before you come in to grad school is very helpful.

What do I do? How do I apply statistics in my job? How did I get started?

I've been invited to a panel discussion by the UCLA undergraduate statistics club. Some of the questions I was told to expect are down below. By answering the questions here, there's a chance of a more literate answer and other students will be able to read the answers as well. 

What do you do on a day-to-day basis?

I'm not sure there's a day-to-day answer to this question! My days are quite varied and full. Some constants are:

  • Teaching classes, office hours, answering student emails. My classes are (i) longitudinal data analysis, (ii) Bayesian data analysis, (iii) multivariate analysis, (iv) statistical graphics. I occasionally present a one or two-day short course on longitudinal analysis.
  • Helping my non-statistician colleagues with their scientific research in many ways: 
    • By applying appropriate statistical methodologies in the analyses of their data,
    • Training their graduate students and biostat graduate students how to analyze their data, depending on who is analyzing the data,
    • Helping them design studies to collect the most useful data possible.
    • Helping them write grants to get money to do their research.
  • Advising my doctoral students on their dissertation work. This can include editing their writing, listening to where they are going with their research, making suggestions on where they might go with their research, advising on employment. 
  • Doing (bio)statistical research. Most of this is done jointly with my students and with some friends. It involves having an idea or three, writing the idea down, working out the details, running examples, writing the paper, then submitting the paper and nursing the paper through the submission process to acceptance. 
  • Lately I've been working on my statistics blog. 
  • Administrative jobs. Every business needs to be managed and run, and academia is no exception. I chair the admissions committee for our department and do other jobs around the department.
  • A number of students from around the university have discovered my Bayes and longitudinal courses, and I get to talk to them about their cutting edge research in various disciplines. There's very little more fun that talking to a highly motivated young person about their research. 
  • Refereeing biostatistics papers, and acting as an associate editor for a Biometrics. 

How do you apply statistics in your job?

  • Teaching statistics, analyzing data with my colleagues, advising doctoral students about statistics, designing studies, power calculations, developing new statistical models computational methods.

How did you get started in statistics?

That's a long story and I was lucky.

As a junior at the University of Minnesota, I was tired of being in school and wanted to graduate and start a career. Problem was, I needed a major and my current major (physics) wasn't going to work. I opened my copy of the University catalog and I read the requirements for every department, starting at the letter A and working my way forward alphabetically to M. At M, I realized, I had always liked mathematics and even better, had been good at it. Even better, I could graduate in 12 months if everything (me and scheduling) worked exactly right. Later that day I called my roommate and told him I was going to major in math. His response was to recommend taking statistics so I could get a job. The idea was that being an actuary paid well, although it had a reputation of being rather dull. Graduation required that I take three year long sequences and I took mathematical statistics, probability theory and real analysis. Those choices turned out to be ideal preparation for graduate school in statistics. That Fall I started mathematical statistics with Don Berry, a Bayesian statistician famous for his advocacy of adaptive Bayesian clinical trials. Don thought I showed promise and he recruited me to graduate school in statistics at the U of M. 

After starting graduate school, I discovered that some of my past activities provided useful preparation for statistics. I was a game player; I'd play chess, backgammon and bridge every chance I got. From chess I got the ability to calculate and to look ahead and to predict. Backgammon and bridge teach probability and all three games teach understanding of other people and their motivations. From bridge I learned Bayesian thinking. One situation in bridge is called a finesse, with the goal of needing to find a particular queen in either your left or right hand opponent's hand. The instructions given to me were: if you think your left hand opponent has the queen, you play this way, if you think the right hand opponent has the queen, you play a different way. At the time, as a novice bridge player, I did not know what to do with that instruction. Later on, I realized that type of thinking was Bayesian in nature.

From backgammon I learned Monte Carlo simulation. Backgammon is gambling game combining skill and chance. In any backgammon position played for monetary stakes, the value of the position is the amount you should pay an opponent if the game is ended at this point without completion. My friends and I would come across a particular position and wonder what the right move was. We would play the game from the given position with first one move than repeat with the other move. Backgammon uses the roll of two six-sided dice to determine what moves can potentially be played at each turn; skill is used to pick the best among the allowed moves given the roll. In complex positions, the best (or correct) move may be unclear, and we would use Monte Carlo simulation of both move choices to determine the value of the game following each move. The move with the higher value is the better move. 

Over my undergraduate years I had worked in two different high energy physics labs helping make detectors for high energy physics research. I also worked in a reading research psychology lab and I worked for a geophysicist who studied chemical compositions of meteorites to understand how the solar system came into being. The geo-physicist and reading research labs had expensive VAX computers running Unix for the sole purpose of data acquisition, management and analysis. Except on weekends, when the VAX might be switched over to play Star Trek. The high energy physicists spent enormous sums (millions) on constructing equipment to collect data. Data was clearly very valuable and I gained a healthy respect for data, even if I didn't know much about it at the time. 

Among other lessons, these experiences taught me that scientists had very strong opinions, and that those opinions might rule over the data on occasion. In psychology, I saw a well-respected senior researcher try to understand why an experiment came out wrong, and how to get it to come out right. He eventually reran the experiment with different subjects and it came out right. After I learned Bayesian statistics I learned a language and tools to think about this behavior. I also learned that scientists get it wrong sometimes. I remember the geo-physics research group standing around talking (while I listened), about how another highly respected research group at a different major university had published a paper in a major journal and had gotten the conclusion dead wrong. Important takeaways from this exposure to science and scientists was that data was important and valuable but data was not everything. Opinions mattered deeply, yet scientists can make bad mistakes. 

 

Subscribe to Statistics preparation