Writing a Biostatistics Doctoral Dissertation Proposal

You have finished all your courses. You have passed your written comprehensive exams. Congratulations! What’s next? If you haven’t already (and you should have), you pick an advisor and start to work on your doctoral dissertation. Writing a dissertation and finishing your doctoral degree involves several steps. Two important steps in finishing your PhD are partially bureaucratic in nature: the preliminary oral and the final oral. These may well be the last two exams of your academic career. This blog post is about the preliminary oral exam and the dissertation proposal.

Depending on your university rules and department traditions, the specific steps in preparing a thesis will vary. Here I talk about my university, which is UCLA, and my department, which is biostatistics and this moment in time, which is late 2018. But please realize: the rules, procedures, and customs surrounding prelim oral exams and dissertation proposals have evolved over the decades. They are not fixed in stone, except by university and department written rules which can and do change. You can expect procedures to evolve over time and to vary by committee and most especially by dissertation advisor.

At UCLA, you have an advisor (you can have two, but one is most common) and you will pick a dissertation committee in conjunction with your advisor. You will prepare a dissertation proposal and hand (or email) it out to your dissertation committee members in advance of the exam. Two weeks in advance is a courtesy (we used to say 3 weeks). The committee meets, you present a talk on your proposal. The committee members ask you questions about your proposal and talk which you answer to the best of your ability. The committee supplies feedback to you and your advisor about your proposed dissertation and makes a decision about you passing or failing the exam.

The preliminary oral exam in our department usually is scheduled to take 2 hours. We’re allowed 3 hours by University policy, but 3 hours is a lot of time for faculty these days, and 3 hours is exhausting for students to present and answer questions. And we haven’t noticed any benefit to the extra hour. So we try to restrict the exam to 2 hours. It is mostly your advisor’s responsibility to make sure the exam runs on time, but you may need to help your advisor out with this. If there are a lot of questions, and usually there are, then you won’t finish all your slides. Not finishing all your slides is possibly the norm, not the exception. You should prepare ahead of time various cuts that you can take without harming the narrative of your talk.

Talk to the student affairs officer to schedule a room for the exam. Often the exam takes place in our biostatistics library, but occasionally it can be an outside room depending on scheduling conflicts. Typically the room is scheduled for half an hour before the exam so that you can set up your computer and display equipment. The exam is for 2 hours, then you have the room for a half an hour to dismantle the equipment, though usually it shouldn’t take that long.

The preliminary oral exam is closed, meaning only the student and the dissertation committee members are allowed in the room during the exam. One committee member who is not the chair or co-chair may connect remotely – I really don’t recommend that – but it is allowed.

Once the committee is fully assembled, the exam starts by you stepping out of the room while the committee meets without you. During this time, the chair will provide an assessment of how it has been working with you. The committee will discuss your academic background, your academic progress, what they can expect from you in terms of progress and development. They may provide feedback on your proposal to your chair. The committee may also discuss the weather. This initial meeting may take from 5 to 20 minutes. When the committee is done, someone will call you back into the room. You will now start presenting. After the presentation and questions are finished, you will step out of the room once again while the committee discusses your presentation and provides your advisor with guidance. After the exam you will typically meet with your advisor to discuss the results of the exam and any comments from the committee.

During your presentation, the committee members will typically interrupt with questions. There are many purposes for these questions:

  • To clarify meaning, as in a short clarification or asking for a definition;
  • To see if you understand what you are saying;
  • To see if you can think on your feet and respond to something different;
  • To see if you understand or are aware of a particular reference;
  • To see if you can extend your work in response to a new idea;
  • To see if you can explain what you just said in a different way;
  • To see if you can answer a question in the middle of a talk.

Students often make assumptions about what the questions mean or imply, but this is usually a mistake and these assumptions are usually incorrect. Do not assume that a question means that the committee member disagrees with something you said. Nor should you assume that the committee member doesn’t understand you, even if the question starts with “I don’t understand …”. An angry question doesn’t actually mean the faculty member is actually angry with you – it more likely means they didn’t get enough sleep the night before or that that is their style. Far and away best is to take the question at face value and answer as best possible.

It used to be that the committee had 5 members, but in these busy times, it has become impossible to get 5 faculty members into a single room at the same time, so the University has reduced the committee size to 4 faculty members. The rules on who can be on a committee are surprisingly complicated. For UCLA’s rules, see https://grad.ucla.edu/academics/doctoral-studies/minimum-standards-for-…. The general intent is to get enough faculty on the committee to supply additional expertise should the student need it, to provide faculty expertise to be able to confirm that your dissertation will be a new contribution to knowledge, and to insure that the committee members have sufficient seniority to provide sensible guidance. At the same time, there is flexibility to find additional expertise if needed, including potentially going outside the university to find a committee member with needed expertise.

The biostatistics department has placed an additional constraint on dissertation committees: one of the faculty members must have a primary appointment from outside the department. This is intended to mean that someone with subject matter scientific expertise is included in the committee, not that someone from mathematics, statistics or computer science is placed on your committee. Our intent is that you explain to someone who is a scientist, and who is specifically not a statistician, what the tools you will develop in your dissertation are, and why they might potentially benefit the scientist. Explaining to a scientist what your tools are teaches you to explain your statistical tools non-technically and requires that you think about the scientist while you work on your dissertation. It’s not enough to say that you’ve improved the root mean square of some estimator – what good will you do for the scientist, and by extension, society with your dissertation work?

The dissertation proposal is a document that you write that tells the committee where you plan to go with your dissertation. This document can take many forms, and it may range in length from arbitrarily short to arbitrarily long, with 40 to 100 double-spaced pages being pretty common. The dissertation proposal has several purposes, though an individual document may not serve all purposes:

  • To show what you know. What you know might be demonstrated by a literature review for example. A long literature review is definitely not required, and is becoming rarer.
  • To show that you can handle the thesis topic. For example, by illustrating the results of example calculations that are similar to what needs to be done in the topic.
  • To show that you can do research.
    • The easiest way to do this is to actually show some novel model or novel results in the proposal.
    • For example, your advisor might have you start working on writing a first paper and that material would then show up in the dissertation proposal.
    • However, you might demonstrate research competence by demonstrating your awareness of past research and knowledge of currently unanswered questions.
  • To show the committee an outline of the future research you intend to undertake in your dissertation. This is the proposal part of the dissertation proposal.
    • This includes any novel research already undertaken that might be given in the proposal.
    • But typically this is unfinished work and is in outline form and shows the committee that you have an idea of where you are going and what you will do.
    • Outline form may mean a paragraph or two on each idea that you propose to execute during your dissertation research.

A very important part of the proposal is where you indicate what is old and what is novel. That is, what old material has already occurred in the literature, and what new material is your own novel work. You will be receiving a PhD because of your novel contributions to biostatistics. If you don’t indicate what is novel and what is old, you cannot expect your committee to understand this distinction. If you don’t indicate what is new, then it becomes up to the committee to figure out what is new, and they might err on the side that everything you said is review. Much easier if you tell them what is new.

In your proposal you should have a section that outlines the planned future research: a “proposal” section. The proposal section will sketch projects that you plan to tackle in your dissertation. I consider it important to have a worked numerical example that shows that you are able to compute with the sort of data and models and methods that you intend to develop in the dissertation. If you’ve submitted your first paper prior to the preliminary oral, you can add an introduction, a non-technical discussion of your paper and proposed additional work, and a proposal section and your proposal is ready for the committee.

Your committee members are expected to read your proposal but might not. If they read the proposal, you can assume they will read or skim the proposal the night before. Thus there is little value in checking in with faculty about issues or comments prior to your talk. Your talk needs to be self-contained, and should not depend on the committee having read the proposal.

The member who is not from biostatistics may well have difficulty reading the mathematical statistical portion of your proposal. So why did the department require you to have a non-statistician scientist on your committee? The reason is that we want you to be able to communicate the value of your statistical research to a non-statistician. Biostatistics has a substantial collaborative aspect to it. Thus, as part of your proposal, you should have a section that explains the value of your work in layman’s terms. This is a courtesy to the outside member of your committee, as well as being important in its own right. Similar, as part of your dissertation, you should have a section or chapter that describes your contributions to biostatistics and science in non-technical language. This section should be completely accurate but not rely on mathematical notation or technical statistical jargon to make its points.

A dissertation can take many forms. A common form that is increasingly popular is to write three separate research papers and then bind them in dissertation format and submit these as the final dissertation. This is not required, and is decided upon primarily by the advisor, with input from the student and possibly the committee. The dissertation is supposed to be publishable, but if one merely writes a dissertation that contains three publishable ideas then it can take a long while to turn the dissertation into the three papers. In contrast, if one writes three papers, then it is quite quick to turn three papers into a dissertation, taking perhaps a few weeks at most, with time mostly spent on formatting your papers into the UCLA dissertation style. For students interested in academia these days, substantial ability to publish must necessarily be demonstrated, so having a good CV out of grad school with a number of publications published or in submission is necessary. The three papers model of a dissertation is required for those students. Similarly, many faculty require the three papers model. I assume this model for the dissertation in the remainder of this discussion.

When writing your proposal, there are a number of technical issues. You may be learning LaTeX, or even if you know LaTeX, you will need to learn new features to format your proposal properly. Similarly, you need to learn bibtex to format your bibliography.

You may not be used to reading technical papers in the statistics literature and you need to start doing this immediately. Those papers can be models for what your dissertation papers will be like. Further, these papers illustrate how to write technical material. Some papers are better written than others, so learn to be critical so that you can learn to write well. Well written technical prose is a signal to the reader that you take your job seriously as an author and it signals that your underlying work may well be worth their time to read. Similarly, formatting your text properly flags to the reader that you take your job of presenting your work seriously and that the underlying work is worth the reader’s time to read. Not formatting your proposal properly signals to your committee that either (a) you don’t take your work seriously, or (b) you don’t understand your tools (LaTeX and English) very well. Or both. And either of those highly correlates with weak or bad statistics. [Bad = wrong, weak = very little new.]

I have read a large number of papers submitted for publication in my lifetime. Poorly written usually (not always, but usually) translates to uninteresting work and it certainly can mean unintelligible. Similarly, in submitting a paper for publication, sloppy formatting is a strong indicator to me that the underlying material is not publishable. Editors of journals have choices of many papers to publish. They don’t mind if they don’t publish the next great paper from you, because they can publish many other people’s next great paper. If you don’t take your work seriously, why should they take your paper seriously? Also, refereeing a statistics paper is hard and if they can take a short cut by recognizing that the paper is poorly written and formatted, they may reject a paper without making a serious determination as to the quality of the underlying work.

The goal of a paper is to communicate new methodology. Similarly the goal of your proposal is to communicate to your committee that you can write a dissertation. The skills needed to write a good proposal will translate to writing good papers and to writing a good dissertation. So take the formatting seriously and take the writing seriously. At the same time, once the preliminary oral exam is over, and assuming you passed, then the proposal is of little interest to anybody. The amount of work in the proposal that you can re-use in the dissertation and in your papers translates to time saved. Hence the advantage of the form where most of the proposal is a start on your first submitted paper. But any time spent on learning to format the proposal is time well spent. And time spent on learning to write technical prose is time well spent. You will spend your life writing technical prose. The better you write, the more useful you will be to your employer, whether you end up self-employed, a professor or go into industry or government.

The preliminary oral exam is a pass-fail exam. The purpose is to confirm that you can do research, that the research topic you have chosen is worth researching, and that you can do the project. The committee will advise your advisor or you on whether you are proposing to do too little or too much, or that the project is too hard for you.

There are many resources on the web about preliminary exams and proposals. The statistics department at UCLA has a nice discussion of the oral exam at http://answers.stat.ucla.edu/groups/answers/wiki/abdb2/Taking_the_Oral_… and a quick check of google finds many resources at UCLA and around the United States.

Good luck!

How To Be A Kick-A## Teacher

25 helpful pieces of advice.

  1. Comportment:
    1. Walk like you're walking away from an explosion in a Hollywood movie.
    2. Tuck your chin in, tilt your head down and look at people from out of the top of your eyes.
    3. Squint.
  2. Lecturing:
    1. Show up late, then run over time because there is so much to cover.
    2. Bring to your lectures more hardware, dry erase markers, computers, tablets,
      cell phones and piles of paper and printed references than any one else.
    3. Consider lecturing in Esperanto so that your international students can learn as much as your native students.
    4. Liberally sprinkle your lectures with Fisher quotes in Latin and French.
  3. Questions:
    1. Take 5 minutes to respond to every question.
    2. Always ask if there are any questions, but don't wait for people to raise their hands.
    3. You have several options for answering questions:
      1. Give a lot of fake life philosophy type advice, but don't actually answer the question.
      2. Point out that you'll get to that answer later in the lecture. It doesn't matter if you actually do get to it later.
      3. Explain how semantic deconstruction enables a pithier answer. Never give the pithier answer.
  4. Give homeworks that are impossible to answer correctly, but don't grade them. Give everyone 100% but only after the quarter ends.
  5. Do not follow the syllabus.
    1. The syllabus should cover everything from Moby Dick to Playfair to Rise and Fall of the Roman Empire to the complete first edition of F. N. David's first book to John Graunt's tables of mortalities and a summary of Biometrika, but only through 1900.
    2. Do not update your syllabus. The syllabus your mentor, who retired the year after you joined the faculty, first wrote during world war II should suffice. He worked hard enough on it after all.
    3. If there are any pre-med students in the class, make sure you cover all the material on the MCAT.
    4. If there are any pre-law students you need not cover the material on the LSAT as there are currently enough seats in first year law classes for all applicants in the United States.
  6. Explain at length that the class really should satisfy distribution requirements in a different area.
  7. Direct everyone to read your blog where you post:
    1. Frequent links to XKCD and phdcomics.
    2. On semantic deconstruction and quantum mechanics.
  8. Teach your students how to do a better job of simulating real data.
  9. Explain the grading policy in detail, but don't follow it.
  10. Orthogonal decomposition theorem. Homeworks, lectures and tests should form an orthogonal basis in knowledge space.
  11. Jokes. Humor always makes difficult material more interesting. If you don't know any statistics jokes, here are some that you can use, preferably without attribution.
    1. Q: How many statisticians does it take to screw in a lightbulb? A: It depends on your model.
    2. Q: What's purple and commutes? An Abelian Grape. (Yeah, well).
    3. An applied statistician, an econometrician, a theoretical statistician and a psychometrician walk into the faculty bar on campus and don't talk to each other. Bose-Einstein statistics at work.
    4. Q: Why don't theoretical statisticians play hide and seek? A: Because no one will look for them.
    5. Q: How many econometricians does it take to make chocolate chip cookies? A: Ten. One to stir the batter and nine to peel the M&Ms.
    6. A statistician came home and found his house burned to the ground. When he asked what happened, the police told him "Well, apparently the chair of the math department came to your house, and ...".

      The statistician's eyes lit up and he interrupted excitedly, "The chair? Of the math department? Came to my house?"
    7. Q: Why do statistics departments put questions on asymptotics on comprehensive exams? A: Otherwise there would be no use for asymptotics at all.
    8. Theoretical statisticians do it better, but only asymptotically. And in the long run, ... well, you know. But don't say anything, it makes them happier.
  12. Prove everything:
    1. Proofs are clearer when you use measure theory, even in an introductory class for biologists.
    2. And martingales allow for more efficient proofs. After all, life is a martingale. Biologists need to learn to appreciate that.
    3. Entropy means it really doesn't matter. Physicists should appreciate that.
    4. Note that everything in class has been proved previously in Doob (1953) or in one of Herman Rubin's Annals of Mathematical Statistics papers from the 1950's.
    5. Or was it the 1945 paper?
  13. Develop the theory of minimal length confidence intervals for the mean of a normal when n=1 and the mean and variance are both unknown.
  14. When students are confused, coding theory (Daleks and Ood 1985) provides an alternative way to derive most statistical models. Ontological science should be relegated to discussion sessions where the TA can handle the presentation.
  15. Refer every questioner to Fisher's original publications on the subject for more more information.
  16. No software need be installed the first two weeks of computer lab. Never use the same software package two weeks in a row.
  17. Use data sets from your own collaborative papers as examples, with results given in class that exactly contradict what was published in the literature.
  18. Due dates can be given in Julian date (in ISO-8601 format of course) during the first half of the semester and in Aztec calendar form for the second half. Dates outside the semester and during finals week should use Ptolemaic and Carbon dates.
  19. Office hours: check the course schedule to maximize the number of conflicts of office hours with other courses.
  20. Cancel class on leap days, during full and new moons, the 13th of every month, for faculty meetings, special seminars and Tuesday and Wednesday of Thanksgiving, Veteran's day and memorial day weeks. Go to at least two international conferences each quarter.
  21. Announce every Friday that class is canceled for Saturday and Sunday.
  22. Take attendance, but never get past the M's.
  23. Always misspell your email address. If possible, delete your email address (there is so much spam after all) and acquire a new one after posting your syllabus to the web site.
  24. Cover ethics, but don't cite your sources.
  25. Per School of Medicine policy, you are required to hold several lectures each semester in rooms other than the scheduled room. Lecture days and room numbers are subject to change even after the lecture has started.
  26. Play classical music during lectures and industrial grunge during midquarters. For a special treat (and personal favorite) arrange for a live performance of Karlheinz Stockhausen's Helicopter String Quartet during your final.
  27. Love your students. Unless that's against school policy, in which case announce that your love is strictly Platonic. Unless that might be misinterpreted, in which case explain the 7 different forms of love, (Eros also known as sorE backwards, Philia, Ludic, Aghast, ACDC, Programa, Flotus, Read-Write, FIFO and GIGO) and after you get them all straight, the first class should be over and you still won't have covered the syllabus and course information sheets. Bring lots of paper towels when you cover GIGO. (For advanced classes: discuss FIFI and FISTR (First In, Screw The Rest) instead of FIFO.)
  28. Finally, after you've mastered all that. Respect your students, love the material, and enjoy yourself.

Next week: Continued third order directed non-Riemannian fractional fast Fourier stochastic differential particle separating longitudinal Latin hyper-active swarm graphs in British bus queueing theory, and practice. Lecturer: Thorin "Missing Totally at Random" Oakenshield.

Filed Under

Short Review: the War of Art by Steven Pressfield

The War of Art: Winning the Inner Creative Battle
by Steven Pressfield

Pressfield is the author of several bestsellers. The War of Art is a 12 step self-help support group for procrastinators, a biological and psychological disection of procrastination and your own personal writer's side-kick in the war against procrastination all in one short text. Pressfield calls the cause of procrastination resistance. Resistance is the voice inside your head, the one that tells you you'll never make it, you're going to fail. Resistance tells you that you NEED to watch the next episode in your TV show, NOW; that going shopping IMMEDIATELY is more important than sitting down to write your dissertation. Pressfield takes resistance apart, explains it in clear language and explains how to overcome it. The book is a quick read. Reading War of Art won't satisfy resistance and as soon as you read the book, resistance is going to kick into high gear with long discussions of why it's important to, well, do whatever it is except get up and do what needs doing, powdermilk. 

Resistance is that thing that makes us read tons of Andy Gelman blog posts instead of working on our next paper. Blogging could arguably also be a form of resistance. I prefer to think blog-keeping is my way of staying sane and cataloging a few of my semi-great thoughts for my future students to hear. Are you listening, future students? Hear hear! And more importantly, blogging is my way of practicing writing on a frequent enough basis to grease the mental writing wheels. 

The War of Art title harkens back to The Art of War by Sun Tzu, available for free on most fine digital reading platforms in multiple versions. I've yet to make it even partway through Sun Tzu's book, but I made it through Pressfield's book in a few bus trips in to work. 

Resistance is feudal. [I've always wanted to write that.] It holds you in fief, and demands you do anything but what is important. 

I'll keep this review short. Be done with your blog reading. NOW. STOP READING THIS BLOG! Go do something you aspire to. If something is holding you back, that something is resistance. Read The War of Art and get yourself on track. 

Short Review: Writing Tools: 50 Essential Strategies for Every Writer

This is the first of perhaps three short book reviews. 

Certain basics of writing I go over with almost every student. Organization, content, paragraphs and sentences. Roy Peter Clark's Writing Tools: 50 Essential Strategies for Every Writer covers most of the them. Clark is an entertaining writer and this is a highly enjoyable read. It's worth reading as literature, even if you aren't in the market for improving your writing. This is highly recommended for beginning professional writers. That includes beginning statisticians like my students and continuing statisticians like myself. 

There are four major parts: Nuts and Bolts, Special Effects, Blueprints, and Useful Habits. Each part contains from 10 to 16 short chapters each presenting a different 'Tool'. As I read, my head kept nodding up-down, yes-yes, uh-huh, yep, told so-and-so that at our last meeting, made those comments yesterday to such and such. Roy Peter Clark presents the tools more succinctly, colorfully and intelligently than I can. He's got more tools than I have and they are ordered, cataloged and polished; I learned a lot!

One strength is that he presents his advice as tools, not as rules. Rules suggest hard and fast brook no prisoners strict laws. A tool is adaptable to many different situations. A tool helps, rules restrict. 

Nuts and Bolts gives expert guidance on writing strong sentences. Less is he explaining common mistakes and more on how to avoid the mistakes in the first place. This is material I spend a lot of time on with my students. Begin sentences with subjects and verbs (Tool 1); place strong words at the beginning and at the end (Tool 2); strong verbs create action, save words (Tool 3). Those are the title, subtitle, and part of the subtitle for the first three tools. Clark crystallizes rules I didn't realize I knew, and adds to the tools I have for writing and for advising students. Clark explains when to use passive voice -- virtually every student starts writing too much passive voice. Some students were taught to use passive in scientific writing (didn't that advice die decades ago?). Students write in passive voice when they are unsure of what they write; they try to distance themselves from what they have written. 

Fear not the long sentence (Tool 7) is advice I usually can't use, but set the pace with sentence length (Tool 18) prefer the simple over the technical (Tool 11), in short works don't waste a syllable (Tool 37) are very appropriate for scientific writing. Prefer the simple over the technical has the subhead: Use shorter words, sentences and paragraphs at points of complexity. This is great general advice as well as great specific advice for any given sentence as technical writing is almost always quite complex. Get the name of the dog (Tool 14) is about supplying informative details -- something that virtually all students do not understand until taught. That model you just presented: tell us what it does, how it works (data in, inferences out, but how?), and why it is needed (What's so great about it?)! 

Other tools would never have occurred to me, but are quite valuable. Save string (Tool 44) talks about saving up little ideas, thoughts and data until you have enough for a paper. I've been doing that, sometimes for decades, but didn't have a name for the behavior or a way to even think about the behavior. Some tools I should engage in but haven't: Recruit your own support group (Tool 47) I should do more of, while (Tool 41) turn procrastination into rehearsal, is my excuse for every delay. Some tools are better for fiction and newspaper writing, but they're fun to read and think about and may be utile even in scientific writing. Tool 26, use dialogue as a form of action talks about how the eye is drawn to short sentences with lots of white space -- advice I promptly used in advising someone preparing presentation slides. 

Tune your voice (Tool 23) I took to heart as advice to me about advising my students: let my students find their own voice. Similarly, limit self-criticism in early drafts (Tool 48) is vital for getting the meat of a project on paper before tightening up language and organizing the content. Too much criticism to early and the creative brain shuts down and that segues into my next review. 

I strongly recommend Peter Roy Clark's Writing Tools, 50 Essential Strategies for Every Writer to every graduate student and to any professor who wishes to improve their writing. 

More on Writing, Guest Post

This from Dr. Robert Bolan of the LAGLC. 

I agree with Rob’s choices of writing references. Strunk & White and Zinnser are indispensable and, perhaps not so surprisingly, they are written well enough so they actually can be read and not only used as quick lookup sources. Of course there are others but these are touchstones of proper English grammar and word usage.

So much of good writing, as Rob suggests, is trying to achieve absolute clarity with the words you choose and how you string them together. Economy is a sacred principle in good writing. Use the right words and use as few as possible. Also, rearrange sentences to get the flow right. For guidance on these skills I like Getting the Words Right: How to Revise, Edit & Rewrite by Theodore A. Rees Cheney. For assistance in technical writing there are several references. I like Merriam-Webster’s Manual for Writers & Editors. For sheer brilliance and clarity of advice, check out Robertson Davies’ Reading and Writing, a slim volume you can read in two or three sessions (read slowly, let sink in, do not gulp this one down). And finally, I offer my fervent belief that scientific writing, although requiring parsimony and precision, need not be dry and devoid of style. Read anything by John Gardner on writing, Stephen King on writing, Eudora Welty on writing, or any novelist or essayist whose style you admire. And then when you’re done with all that, read Paradise Lost—aloud—not for comprehension but for the sheer thunderous music of it. Philip Pullman, who wrote the introduction to my edition of Milton’s masterpiece, remarked that "the experience of reading poetry aloud when you don’t fully understand it is a curious and complicated one. It’s like suddenly discovering you can play the organ." You will likely be thinking that poetry has nothing to do with scientific writing. I disagree. All writing exists for the purpose of communication. Again, scientific writing need not be sterile—although that often appears to be the gold standard for editors. If you have something important to say, you must say it clearly, of course. But cadence and musicality, sparingly used, can deliver your meaning with an elegance that will, unbeknownst to the reader, nestle it into place with crystal clarity. Compose, don’t just write.

Obsessive attention to nuance and detail in writing can be a curse as well as a virtue, and every true writer can identify with the following. A friend of Oscar Wilde’s is reported to have asked him what he did yesterday.  Wilde replied: "In the morning I took out a comma. In the afternoon I put it back in again."

Me again regarding this last: A wonderful short essay on being too critical of yourself early in the writing process is Gail Godwin's The Watcher at the Gate. 

Some Questions from an Undergraduate for a Biostatistics Graduate Admissions Chair

These are questions I answered recently for an undergrad who had questions about how best to prepare herself for graduate school. I thought the answers might be more widely of interest, and with a lot of editing, am including the questions and answers here. Disclaimer: Your mileage may vary. A lot. 

What do graduate programs look for in an applicant? How does admissions work? 

Different programs differ immensely. I imagine that what got me into my graduate program only got me in because I happened to be taking a course from the department chair the quarter I applied to graduate school. My own grad school application would not get me into UCLA Biostat PhD program at this time, though it might get me into our MS program. [There's a pretty harsh moral in there somewhere.] Biostat programs often have less math requirements than do statistics program, but they also may or may not teach as much math stat as a stat program. Ours does teach plenty of math stat. A lot of programs (stat and biostat alike) admit to the MS, then pass you through to the PhD program if you do well. We admit a few people to the PhD, many more to the MS, and we admit to the PhD from our own MS. I believe this is how many programs work. (Another model is that some programs admit only PhD students, but if people don't make it through the PhD, they are sent off with an MS degree as a consolation prize.) There are no doubt other models.    

Is it more important that I take particular mathematics courses before sending in my application or get good grades in the ones that I do take?

I'd vote for good grades. A good grade when you've only taken one or two (upper division) math courses is an 'A'. If you're going for a PhD you'll need to show that you can do PhD work though, and that means taking a few difficult math courses. Undergrads usually don't take what I would call "really difficult statistics courses".  But that is certainly university/program specific. If you take a lot of math classes, then you have enough to average out the occasional bad grade. But don't ask me to tell you how to balance out the occasional bad grade (i.e. a B) more mathematics courses.  

What do you look for in a graduate student?

You need to put in many hours (Confer Outliers, by Malcolm Gladwell) to get really good at something. So putting in the time is worth while, starting now. For example, no graduate program teaches you everything about a subject; if you're going to have a well rounded education after receiving your PhD, you're going to have to teach yourself more than you learn in class. How do you teach yourself material? Start now, trial and error, learn stuff, do badly on somethings, better on others, and most importantly, keep on trying. So that's what I really want in a doctoral student: curiosity, an ability to figure stuff out, an inquiring mind, stick-to-it-tiveness, someone interesting we'd really like to have around for several years. Sadly that's not what is tested for on the GRE. 

Do you, as an admissions officer, look at the courses applicants are "currently enrolled in" even if there is not a grade attached to that course?

Yes, absolutely. But mainly when it matters to my admissions decision. Here's one hypothetical example: Consider someone with all B's in their first 2 years of college who suddenly 'finds themself' and gets all A's their junior year; if they are a math major, I really want to know about those Fall grades. If they are all A's that person could be considered for the PhD program. If they are a mix of B's and some A's, maybe I'm willing to admit to the MS program only. Someone with all A's in their first three years of undergrad (yes, they exist), I probably don't need to see senior year Fall grades to make an admissions decision.

Here's another common hypothetical: someone who qualifies for one degree program, but is enrolled in courses, that, if she gets A's in them, would qualify her for a different degree program. And suppose she prefers that 'different degree program'. Then I'm looking at the courses, and I actually need to wait to see the grades before I can make a sensible admissions decision. 

What math courses specifically do you look for in a student's application/ suggest I take before I apply?

For the MS program you need to take UCLA's Math 31AB 32AB and 33AB sequence. [Those of you elsewhere can look up what those courses are and find the matching courses at your own university). Next is any/every junior/senior level math classes. Once people are enrolled in our MS program, but supposing they are interested in the PhD program, there isn't lots of time to take many math courses, so we basically only recommend that they take real analysis 131a (and 131b) and linear algebra (115a and sometimes 115b). I guess those qualify as the courses to take if you can only manage a small number of courses. But a lot of junior/senior level math courses can be useful, depending on what sort of statistics one ends up in. Also, real analysis can be a real bear of a course, and it may be helpful to ease into by taking something else junior/senior level first. For direct admission to the PhD program, more math (with good grades, natch) is always better. Remember: You can never be too think, too rich or know too much math. 

Advice to a Prospective Biostatistician

This is advice to a prospective student wondering whether to go into public health/epi or biostatistics. I'm willing to blindly argue for biostatistics, but prospective students might find it more useful if I frame the issues so they can decide for themselves. 

Biostatistician or subject matter specialist? Do you want to work in one discipline or do you like to move around? Not sure what exact field you want to work in?

From the perspective of scientists, biostatisticians are natural generalists. Biostatistics is great for people who like science, or better, for people who like sciences. As a kid, when asked what I wanted to be when I grew up, I had a long list of 'ists' that I wanted to be. As a biostatistician, I can work with an environmental health scientist this morning, then work on maternal and child health in South African townships in the afternoon. I teach classes to apprentice statisticians. These apprentice statisticians work with scientists from a variety of fields on a wonderfully varied set of research questions and data analyses. In some classes, I teach scientists: political scientists and epidemiologists, educators and emergency room docs, health policy wonks and psychologists, geneticists and nutritionists and the occasional urban planner. And after they've taken my class, we can talk and communicate at a higher level than we could before they took my class. And we've written papers together, because they still don't know enough statistics to solve their data analysis problems but now they can handle the software and understand the issues if we talk things through together.

The world needs both specialists and generalists. In the discipline of biostatistics, biostatisticians do specialize. I specialize in Bayesian statistics, hierarchical modeling and longitudinal data. Other people specialize in psychiatric statistics or causal inference or survival analysis. 

Scientists ask biostatisticians how to analyze data, and, to analyze their specific data. They ask us how to design studies, and, to design their specific study. We write data analysis plans, we do power calculations, we analyze data, we report results. We teach statistics, and we figure out better ways to analyze data.

Statistics or biostatistics? Biostatistics is not much different from statistics. In biostat, we're usually concentrating on analyzing data from public health, medicine, biology and public policy. Statisticians can do that as well. 

Did you want to go to med school? But you preferred math? Biostat may be for you!

How do we analyze the data? How should we analyze the data? What can we do in the time available and with the budget allotted and with the skills we currently have? What's the answer? How do we even get to an answer? What does the answer mean? How good is this answer? Those are the questions biostatisticians discuss and work on and ponder and figure out how to answer. We do this in the context of particular data sets, working closely with scientists and advocates and doctors and teachers to help them understand the truth of their data and the truth of the world seen through their data. Biostatisticians do this generically, at a higher level, thinking about how to analyze data sets like this data set. We develop statistical methods that will help analyze classes of data sets, not just a single data set. 

Public health trains scientists, advocates and educators. Epidemiologists or community health scientists or anthropologists or environmental health specialists are usually scientists, and some are advocates and some are educators and some are combinations of all three. Health policy produces managers and also scientists. Biostatisticians are scientists, but we are general scientists. We know how to do science and we know how to think about science. We have a general outline of science sitting in our heads; when we learn about a particular study or discipline, we fill in that outline with more details, but the outline never goes away. In grant writing: we know the endgame: we know what we will do with the data we're proposing to collect. The endgame tells us whether the experimental design will return the information the scientist needs and it points us to better designs to collect data. We know where we're going so we can advise on how to get there.  

In academia especially, ​biostatisticians get to do all the fun stuff: designing studies, analyzing data, talking to students and other statisticians and scientific colleagues about what it all means, writing up results. We leave the difficult annoying time-consuming mind-numbing stuff to our colleagues​: figuring out what questions to ask our subjects and how to phrase them, actually collecting the data, keeping the mice and feeding them, storing the data, though we dive in and consult as needed. 

What do you want to be when you grow up? I want to be an environmental scientist, a paleontologist, a nutritionist, and an epidemiologist. I want to talk to physicists and computer scientists, biologists and geneticists, engineers and business people, field workers and economists. I want to be a biostatistician. 

Intuitively, it is clear that it is obvious that any idiot can see

In technical writing, three terms/phrases not to be used:

  • Intuitively, ...
  • It is clear that ...
  • It is obvious that ...

Just as well you could write

  • Any idiot can plainly see ...

These phrases may be true for you, the writer. However, the reader won't have your depth of understanding of the subject matter and especially of the current notation and material. When the reader does not find the truth as plain as the unvarnished nose on your face, using these phrases directly affronts dear reader. As figure 17 plainly demonstrates both to the unaided eye and to any imbecile, obviously you shouldn't use these phrases. Even my pet gerbil knows using these phrases will insult anyone with a kindergarten education and clearly get your paper rejected posthaste.

Acknowledgments: I first heard this advice indirectly from Don Berry via a student taking a course from him. Berry gave extensive feedback on a writing assignment. The 'even my pet gerbil knows' phrase is courtesy of Andy Unger, MN chess expert. 
Filed Under

Clarity and Kindness

I'm editing a generally well written, near-final draft of a biostatistics paper. Worth broadcasting are several writing problems that occur in almost all grad student writing. 

  • Don't denigrate your contributions. 
    • Original: A simple way to achieve this ... 
    • Edit: A way to achieve this ...
    • Comment: Be respectful of your contributions. Are you so close to your own solution you can't see how important it is? Perhaps you've forgotten how innovative your solution was, given how long you've been living with it. Modesty, either real or false is not rewarded in academia. Besides, if you're really a scientist (and you are if you're a statistician), honesty is an important characteristic. Being honest about the importance of your work may not be easy, but it is important. Your work may be mathematically simple, but if you describe your idea as simple, readers will assume you meant that the idea is trivial.
  • Don't represent,
    • Original:  ... [statement of key idea] because this represents [key idea alternative] ...
    • Edit: ... [key idea] because this is [key idea alternative] ...
    • Comment: Represents is wishy-washy and could imply any of a number of relationships. Be firm. If A and B are the same thing, say A is B, not A represents B. 
  • Use the same language every time. ​
    • First Original: dispersion around $x$
    • Second Original: dispersion
    • Edit both times: dispersion around $x$ 
    • Comment: Apparently in the original text, there can be more than one dispersion.  Describing the dispersion as "around $x$", implies there are or could be other kinds of dispersions not around $x$. Thus the need to keep the modifier in repeated useage.  
  • Plot don't Show
    • Original: Figure 2 shows ...
    • Edit: Figure 2 plots ...
    • Comment: Or: Figure 2 is ... . Figure 2 doesn't show anything if you're not well enough educated to understand the plot in the first place. Figure 2 contains the plot, but Figure 2 doesn't show anything. 


Guest Post: The Importance of Keeping Your CV/Resume Current

Guest post by Robin Jeffries, copied from the niece* blog NorCalBiostat.

My graduate advisor was adamant about me keeping my CV current. Every little consulting project, every award, presentation or co-authorship on a paper had to be on there. When I would share my joy at getting an award, acceptance at a conference presentation or for a poster his immediate first statement was “Is it on your CV yet?” Well, perhaps after a congratulations.

It’s such a simple thing to do but also a simple thing to keep putting it off and then forgetting. Over the past few years I’ve gotten better at adding things on almost immediately, and it has paid off so many times. Right now I’m very casually looking at what my next career step will be. When I find something that I just can’t pass up I am always thankful that it is only minor changes and update to my CV that need to be made. Applying for jobs can be stressful enough. Keeping your CV up to date makes it one less thing to worry about. Save your energy for your cover letter.

And don’t be afraid to change the style on your resume now and again. Yes it can be a lot of work, but tastes change and what you thought was an amazing font may not look so good a few months later.

Same concept applies to blogs, but that will take me much longer to become a habit.

I concur.  

* Robin was my doctoral student. This is my blog. NorCalBiostat is her blog. The doctoral student of my doctoral student is my grand-student. Andy Gelman regular refers to his blog's sister blog. Therefore she is my blog's sister, and her blog is my blog's niece blog. Does Ancestry.com have any documentation on this?

Subscribe to Advice