## Writing a Biostatistics Doctoral Dissertation Proposal

You have finished all your courses. You have passed your written comprehensive exams. Congratulations! What’s next? If you haven’t already (and you should have), you pick an advisor and start to work on your doctoral dissertation. Writing a dissertation and finishing your doctoral degree involves several steps. Two important steps in finishing your PhD are partially bureaucratic in nature: the preliminary oral and the final oral. These may well be the last two exams of your academic career. This blog post is about the preliminary oral exam and the dissertation proposal.

Depending on your university rules and department traditions, the specific steps in preparing a thesis will vary. Here I talk about my university, which is UCLA, and my department, which is biostatistics and this moment in time, which is late 2018. But please realize: the rules, procedures, and customs surrounding prelim oral exams and dissertation proposals have evolved over the decades. They are not fixed in stone, except by university and department written rules which can and do change. You can expect procedures to evolve over time and to vary by committee and most especially by dissertation advisor.

Talk to the student affairs officer to schedule a room for the exam. Often the exam takes place in our biostatistics library, but occasionally it can be an outside room depending on scheduling conflicts. Typically the room is scheduled for half an hour before the exam so that you can set up your computer and display equipment. The exam is for 2 hours, then you have the room for a half an hour to dismantle the equipment, though usually it shouldn’t take that long.

The preliminary oral exam is closed, meaning only the student and the dissertation committee members are allowed in the room during the exam. One committee member who is not the chair or co-chair may connect remotely – I really don’t recommend that – but it is allowed.

During your presentation, the committee members will typically interrupt with questions. There are many purposes for these questions:

• To clarify meaning, as in a short clarification or asking for a definition;
• To see if you understand what you are saying;
• To see if you can think on your feet and respond to something different;
• To see if you understand or are aware of a particular reference;
• To see if you can extend your work in response to a new idea;
• To see if you can explain what you just said in a different way;
• To see if you can answer a question in the middle of a talk.

Students often make assumptions about what the questions mean or imply, but this is usually a mistake and these assumptions are usually incorrect. Do not assume that a question means that the committee member disagrees with something you said. Nor should you assume that the committee member doesn’t understand you, even if the question starts with “I don’t understand …”. An angry question doesn’t actually mean the faculty member is actually angry with you – it more likely means they didn’t get enough sleep the night before or that that is their style. Far and away best is to take the question at face value and answer as best possible.

It used to be that the committee had 5 members, but in these busy times, it has become impossible to get 5 faculty members into a single room at the same time, so the University has reduced the committee size to 4 faculty members. The rules on who can be on a committee are surprisingly complicated. For UCLA’s rules, see https://grad.ucla.edu/academics/doctoral-studies/minimum-standards-for-d.... The general intent is to get enough faculty on the committee to supply additional expertise should the student need it, to provide faculty expertise to be able to confirm that your dissertation will be a new contribution to knowledge, and to insure that the committee members have sufficient seniority to provide sensible guidance. At the same time, there is flexibility to find additional expertise if needed, including potentially going outside the university to find a committee member with needed expertise.

The biostatistics department has placed an additional constraint on dissertation committees: one of the faculty members must have a primary appointment from outside the department. This is intended to mean that someone with subject matter scientific expertise is included in the committee, not that someone from mathematics, statistics or computer science is placed on your committee. Our intent is that you explain to someone who is a scientist, and who is specifically not a statistician, what the tools you will develop in your dissertation are, and why they might potentially benefit the scientist. Explaining to a scientist what your tools are teaches you to explain your statistical tools non-technically and requires that you think about the scientist while you work on your dissertation. It’s not enough to say that you’ve improved the root mean square of some estimator – what good will you do for the scientist, and by extension, society with your dissertation work?

The dissertation proposal is a document that you write that tells the committee where you plan to go with your dissertation. This document can take many forms, and it may range in length from arbitrarily short to arbitrarily long, with 40 to 100 double-spaced pages being pretty common. The dissertation proposal has several purposes, though an individual document may not serve all purposes:

• To show what you know. What you know might be demonstrated by a literature review for example. A long literature review is definitely not required, and is becoming rarer.
• To show that you can handle the thesis topic. For example, by illustrating the results of example calculations that are similar to what needs to be done in the topic.
• To show that you can do research.
• The easiest way to do this is to actually show some novel model or novel results in the proposal.
• For example, your advisor might have you start working on writing a first paper and that material would then show up in the dissertation proposal.
• However, you might demonstrate research competence by demonstrating your awareness of past research and knowledge of currently unanswered questions.
• To show the committee an outline of the future research you intend to undertake in your dissertation. This is the proposal part of the dissertation proposal.
• This includes any novel research already undertaken that might be given in the proposal.
• But typically this is unfinished work and is in outline form and shows the committee that you have an idea of where you are going and what you will do.
• Outline form may mean a paragraph or two on each idea that you propose to execute during your dissertation research.

A very important part of the proposal is where you indicate what is old and what is novel. That is, what old material has already occurred in the literature, and what new material is your own novel work. You will be receiving a PhD because of your novel contributions to biostatistics. If you don’t indicate what is novel and what is old, you cannot expect your committee to understand this distinction. If you don’t indicate what is new, then it becomes up to the committee to figure out what is new, and they might err on the side that everything you said is review. Much easier if you tell them what is new.

In your proposal you should have a section that outlines the planned future research: a “proposal” section. The proposal section will sketch projects that you plan to tackle in your dissertation. I consider it important to have a worked numerical example that shows that you are able to compute with the sort of data and models and methods that you intend to develop in the dissertation. If you’ve submitted your first paper prior to the preliminary oral, you can add an introduction, a non-technical discussion of your paper and proposed additional work, and a proposal section and your proposal is ready for the committee.

Your committee members are expected to read your proposal but might not. If they read the proposal, you can assume they will read or skim the proposal the night before. Thus there is little value in checking in with faculty about issues or comments prior to your talk. Your talk needs to be self-contained, and should not depend on the committee having read the proposal.

The member who is not from biostatistics may well have difficulty reading the mathematical statistical portion of your proposal. So why did the department require you to have a non-statistician scientist on your committee? The reason is that we want you to be able to communicate the value of your statistical research to a non-statistician. Biostatistics has a substantial collaborative aspect to it. Thus, as part of your proposal, you should have a section that explains the value of your work in layman’s terms. This is a courtesy to the outside member of your committee, as well as being important in its own right. Similar, as part of your dissertation, you should have a section or chapter that describes your contributions to biostatistics and science in non-technical language. This section should be completely accurate but not rely on mathematical notation or technical statistical jargon to make its points.

A dissertation can take many forms. A common form that is increasingly popular is to write three separate research papers and then bind them in dissertation format and submit these as the final dissertation. This is not required, and is decided upon primarily by the advisor, with input from the student and possibly the committee. The dissertation is supposed to be publishable, but if one merely writes a dissertation that contains three publishable ideas then it can take a long while to turn the dissertation into the three papers. In contrast, if one writes three papers, then it is quite quick to turn three papers into a dissertation, taking perhaps a few weeks at most, with time mostly spent on formatting your papers into the UCLA dissertation style. For students interested in academia these days, substantial ability to publish must necessarily be demonstrated, so having a good CV out of grad school with a number of publications published or in submission is necessary. The three papers model of a dissertation is required for those students. Similarly, many faculty require the three papers model. I assume this model for the dissertation in the remainder of this discussion.

When writing your proposal, there are a number of technical issues. You may be learning LaTeX, or even if you know LaTeX, you will need to learn new features to format your proposal properly. Similarly, you need to learn bibtex to format your bibliography.

I have read a large number of papers submitted for publication in my lifetime. Poorly written usually (not always, but usually) translates to uninteresting work and it certainly can mean unintelligible. Similarly, in submitting a paper for publication, sloppy formatting is a strong indicator to me that the underlying material is not publishable. Editors of journals have choices of many papers to publish. They don’t mind if they don’t publish the next great paper from you, because they can publish many other people’s next great paper. If you don’t take your work seriously, why should they take your paper seriously? Also, refereeing a statistics paper is hard and if they can take a short cut by recognizing that the paper is poorly written and formatted, they may reject a paper without making a serious determination as to the quality of the underlying work.

The goal of a paper is to communicate new methodology. Similarly the goal of your proposal is to communicate to your committee that you can write a dissertation. The skills needed to write a good proposal will translate to writing good papers and to writing a good dissertation. So take the formatting seriously and take the writing seriously. At the same time, once the preliminary oral exam is over, and assuming you passed, then the proposal is of little interest to anybody. The amount of work in the proposal that you can re-use in the dissertation and in your papers translates to time saved. Hence the advantage of the form where most of the proposal is a start on your first submitted paper. But any time spent on learning to format the proposal is time well spent. And time spent on learning to write technical prose is time well spent. You will spend your life writing technical prose. The better you write, the more useful you will be to your employer, whether you end up self-employed, a professor or go into industry or government.

The preliminary oral exam is a pass-fail exam. The purpose is to confirm that you can do research, that the research topic you have chosen is worth researching, and that you can do the project. The committee will advise your advisor or you on whether you are proposing to do too little or too much, or that the project is too hard for you.

There are many resources on the web about preliminary exams and proposals. The statistics department at UCLA has a nice discussion of the oral exam at http://answers.stat.ucla.edu/groups/answers/wiki/abdb2/Taking_the_Oral_E... and a quick check of google finds many resources at UCLA and around the United States.

Good luck!

Filed Under:

## Rare Events, Gun Violence, and the Nefarious Large Organization

Suppose you are the Nefarious Large Organization (NLO) and you want to kill lots of random Americans but you do not want your fingerprints on the trigger weapons.

How might you go about doing this? Having a long time frame helps. You're going to need a large political arm that can help with loosening laws to create the conditions you need. You'll need to grow your NLO organization.

A two prong approach can work wonders. You need to get large numbers of weapons into the hands of lots and lots of people.

First, which weapons? Knives don't kill many people, are very personal, and are hard to work with. You might get hurt trying to use a knife, especially if someone fights back. Bombs are difficult to make and have a habit of blowing up their amateur makers as much as blowing up the intended targets. Plus they need to be hidden. There is a military complex that makes bombs, and there is a cottage industry in bomb making in parts of the third world, but in the first world, bombs perhaps aren't optimal. You would need better bomb training and if your NLO started marketing bomb making courses, people might catch on when bombs with your designs started killing lots of people. Guns would be better, as other companies make them. Your NLO doesn't need to make guns, just advocate for their use and acquisition. Although guns take some skill, that skill is easily acquired. Or you can shoot into large crowds where it is hard to miss. But Saturday night specials and revolvers have a limited number of bullets. Having guns with lots of bullets that shoot quickly in the hands of many people is paramount for achieving your goal of killing lots of Americans. Thus the need for automatic firearms.

This is where your NLO political operation becomes paramount. It takes a long time frame and plenty of cash to get politicians to set the stage where you can get these automatic firearms into the hands of lots of people. This may take you decades. Advertising can induce a cachet to owning automatic firearms and after the population of owners gets large enough, network effects will work to spread their availability and ownership.

If you can get enough weapons into enough hands, the 'which people' part will solve itself as there are always some people willing to work with you in your goal of wanting to kill lots of Americans without your fingerprints on the weapons. You won't even need to communicate directly with those people.

Humans do not classify neatly into 'good people' or 'bad people'. We're all good some of the time and we're all bad at least occasionally. The fractions of good/bad vary from person to person and over time within a person. Toss in a rare event (what people call 'bad luck' or 'an accident') and something terrible can happen. You speed, you drink and drive, you walk atop a wall, you clamber up a steep cliff. People can be immature, then they grow up, get a job, get married and settle down. People can be fine, then get depressed or desperate. Your spouse or mother dies and you have no one caring for you, keeping you safe and on an even keel, keeping those mental demons quiet. You suddenly are isolated, alone. You have difficulty in school, get teased a lot.

It's a fiction that we just have to keep the guns out of the hands of the 'bad people'. Truth is that most of our killers are 'good people', legally speaking, right up until they actually start killing people. Medically, it's virtually impossible to distinguish people at risk of doing bad things some time in the future and those who are not. Certainly before they bought or inherited that first gun, before their friend or neighbor or local advertising circular introduced them to that gun, they were fine, legally speaking. They buy that first gun, they still classify as 'good people'. Before something bad happened to them, before they lost their job, before their last parent died, before they were picked on, before they became depressed or suicidal, people were 'good people'.

You the NLO need to get those guns into the hands of millions of people. Some of those people are going to have problems. They become sick. They may get depressed. They get in a car accident. Socially they are isolated, or they become isolated. They don't fit in. They get teased. They feel like they are teased, even if they are not. Mild paranoia may set in. They think strangers and foreigners are going to take what is rightfully theirs. They may not have been raised with a full set of social skills to handle modern society. These people aren't all that common, they're not the majority. Most of us humans are doing fine, are decent people and aren't teased so badly that we think about becoming killers or drug addicts. That's why you need millions of gun owners. Not all of those millions of gun owners are going to have good sense. They may give or sell their guns to some troubled soul. They may encourage some troubled soul to buy guns as a way to self-worth. They themselves may grow to have psychological problems themselves even though they've been fine for decades. Take millions of people, and some of them are just going to be 'off'.

You make things easy, those bad things you want to have happen will happen more often. Make it easy to commit suicide, more people will commit suicide. Make it easier to commit mass murder, more people will commit mass murder. We are not talking about making the average person commit suicide or mass murder. Just those in the extremes. Just loosen the laws enough so that one person in a million has the ability to shoot someone else or to shoot themselves.

Similarly, reduce health care, screw up schools, reduce social services, make lives more miserable, increase access to drugs, do what you can to cause more difficulties for more people. This will provide a lush environment to grow people with the potential to have further problems.

Maybe these troubled individuals are only one in a million. This is the problem and the issue of rare events. This is where you, the NLO can hide behind the problem of rare events. How could we foresee some random, rare event occurring? Who us? How could we have figured out that lone individual in Las Vegas or Parkland or Orlando or Sandy Hook might be that one individual who feels so aggrieved that we wish we could have seen them coming? Gee, not my fault you say.

If an event is one in a million, then you need millions of attempts to get a 'success'. If you're the NLO, you need millions of gun owners, so that your guns end up in the hands of that one person who decides that killing a lot of others is the solution to their problems. Low probability events happen when you have lots of chances. Someone always wins the lottery. Your NLO political arm has been working tirelessly for years. Loosening laws, fighting hard, fighting dirty, making glib arguments, buying off amenable politicians. You can tolerate rare events, because you've gotten your guns into the hands of millions of people. You've got millions of chances to hit the jackpot.

You make things easy, those bad things you want to have happen will happen more often. Make it easy to commit suicide, more people will commit suicide. Make it easier to commit mass murder, more people will commit mass murder. We are not talking about making the average person commit suicide or mass murder. Just those in the extremes. Loosen the laws enough so that one person in a million has the ability to shoot someone or to shoot themselves. Eliminate health care, screw up schools, reduce social services, make lives more miserable, do what you can to cause more difficulties for more people. This will provide a warm environment to grow people with the potential to have further problems. You don't need everyone to want to do your bidding, you just need one person. One person in a million.

Modeling is important and useful. Having killers getting lots of press attention in the popular media is very helpful to your NLO plan. More and more of your automatic rifle owners see that killing lots of people is a really neat way to get some attention. The media is on your side, NLO, because the media will spread around the how-to and the what-to and the fame of these killers.

And when you've done your job right, your NLO fingerprints aren't on the trigger. When the killing is over, there is no obvious link between killers. But you've achieved your purpose. Setting many many small probability events in motion, incubating, waiting to see who cracks next, who decides to shoot next.

As the smoke settles, as survivors decide whether to tear down or rebuild the building, as survivors make memorials and attend funerals, your spokespeople say: don't politicize this, don't make decisions in the heat of the moment, don't do something you'll regret later. Your political arm materials just write themselves, don't they?

Sadly the memorials need to be written too. She wanted to be a scientist. He was everyone's friend. They just wanted to watch a good movie. She was an inspiring teacher. He was a great football coach. He was here on vacation. She was in 1st grade. I don't suppose, NLO, that you'd like to contribute to writing the memorials for the people killed today? How about the people killed yesterday? Your machinations are working NLO, congratulations. Maybe you'll work on the future memorials for those killed tomorrow? We appreciate your assistance.

## Time to Update the P-Value Dichotomy to a Trichotomy

In executing a classical hypothesis test, a small $p$-value allows us to reject the null hypothesis and declare that the alternative hypothesis is true.

This classical decision requires a leap of faith: if the $p$-value is small, either something unusual occurred or the null hypothesis must be false.

These days we should add a third possibility. That we searched over several models and methods to find a small $p$-value. We need to update the $p$-value oath of decision making to state: Either something unusual happened, we searched to find a small $p$-value or the null hypothesis is false.

Note that being Bayesian doesn't necessarily avoid this problem. Suppose a regression model $Y = X\beta+ \mbox{error}$. Apologies for not defining notation, except that $\beta$ is a $p$-vector with elements $\beta_k$. One way to define a one-sided Bayesian $p$-value is the posterior probability that $\beta_k$ is less than zero. If this probability $P(\beta_k \lt 0 | Y)$ is near 0 or near 1, then we declare "significance". Basically the Bayesian $p$-value tells us how much certainty we have about the sign of $\beta_k$. The usual classical $p$-value is approximately twice the smaller of $P(\beta_k \lt 0 | Y)$ and $P(\beta_k \gt 0 | Y)$. How close the approximation is depends on the relative strength of the prior information to the information in the data, the observed Fisher information. The Bayesian $p$-value is subject to the same maximization by search over models as the classical $p$-value.

Bayesians have an alternative to merely searching over models however. We can do a mixture model (George and McCulloch 1993, JASA; Kuo and Mallick 1998, Sankhyā B) and incorporate all the models that we've searched over into a single model to calculate the $p$-value.

Filed Under: