Happy New Year, It's Too Late

[Image of Clock]Couple people wished me happy new year yesterday, Jan 16th.  But, you realize, the year is already 1/24th over?  From R, rounding by W

> 16/365
[1] 0.0438
> 1/24
[1] 0.0417
> 15/365
[1] 0.0411

Somewhere between the 15th and the 16th we crossed the divide from less than 1/24th to more than 1/24th over. Today being the 17th, your year is 4.66% over.  Though you can wait until midnight to celebrate that particular milestone.  For your planning purposes, when January ends,

> 31/365
[1] 0.08493151
> 1/12
[1] 0.08333333
> 30/365
[1] 0.08219178

we will have passed the 1/12th point of the year.  
Whatever happened to two thousand and twelve?    

 

Filed Under

Clarity and Kindness

I'm editing a generally well written, near-final draft of a biostatistics paper. Worth broadcasting are several writing problems that occur in almost all grad student writing. 

  • Don't denigrate your contributions. 
    • Original: A simple way to achieve this ... 
    • Edit: A way to achieve this ...
    • Comment: Be respectful of your contributions. Are you so close to your own solution you can't see how important it is? Perhaps you've forgotten how innovative your solution was, given how long you've been living with it. Modesty, either real or false is not rewarded in academia. Besides, if you're really a scientist (and you are if you're a statistician), honesty is an important characteristic. Being honest about the importance of your work may not be easy, but it is important. Your work may be mathematically simple, but if you describe your idea as simple, readers will assume you meant that the idea is trivial.
  • Don't represent,
    • Original:  ... [statement of key idea] because this represents [key idea alternative] ...
    • Edit: ... [key idea] because this is [key idea alternative] ...
    • Comment: Represents is wishy-washy and could imply any of a number of relationships. Be firm. If A and B are the same thing, say A is B, not A represents B. 
  • Use the same language every time. ​
    • First Original: dispersion around $x$
    • Second Original: dispersion
    • Edit both times: dispersion around $x$ 
    • Comment: Apparently in the original text, there can be more than one dispersion.  Describing the dispersion as "around $x$", implies there are or could be other kinds of dispersions not around $x$. Thus the need to keep the modifier in repeated useage.  
  • Plot don't Show
    • Original: Figure 2 shows ...
    • Edit: Figure 2 plots ...
    • Comment: Or: Figure 2 is ... . Figure 2 doesn't show anything if you're not well enough educated to understand the plot in the first place. Figure 2 contains the plot, but Figure 2 doesn't show anything. 

 

Why shouldn't I dichotomize my outcome variable?

My collaborators periodically want to dichotimize a continuous outcome such as a depression scale into a binary depressed/not depressed variable. Another popular one is Body Mass Index (BMI) gets classified into obese/not obese. Every time this arises, I get to discourage them from dichotomizing and I get to explain why. Sometimes dichotomization is called "analyzing caseness". Here are some of the reasons why we shouldn't directly analyze caseness. 

  1. Dichotomization is a bad idea because it throws away useful information. Dichotimizing variables takes a nice plump juicy continuous variable with lots of information and turns it into a scrawny binary 0-1 single bit of datum with very little information. 
    1. This is like having your computer person take your hard drives with all that hard won data on them, and having her throw out 3 out of every 4 hard drives. (She wouldn't do that really. So why would you?) 
  2. Caseness treats people who are similar, but on different sides of the cut point as very different. Consider IQ score, a continuous variable with (say) mean 100, sd of 15.  Cutting the data at Y=100 divides people into above or below average. 
    1. A person who has an IQ of 99 is treated as hugely different from a person with IQ of 101. 
    2. Conversely, two people, one of whom has IQ 115 is treated different from a person with IQ of 85, but the difference between those two people is treated the same as the difference between the IQ 99 and IQ 101 pair. 
    3. Two people, one who has IQ 102 and one who has IQ 201 are treated identically. Hmmmm.
  3. On average, dichotomizing a continuous variable will lead to larger standard errors, smaller effect sizes, less power, and missed effects. 
  4. Solution!  Do not throw the baby out with the bathwater. [Actually bathwater is a solution too.]
    1. We can analyze the continuous variable and after we are done with the analysis, if it is of interest, we can convert to caseness at the end of the analysis, and draw conclusions about the probability (or odds) of caseness in the treatment group versus control group, or among men versus among women. 

Given a point estimate \hat{\mu}_{tmt} for the mean of the treatment group, and given an estimated population standard deviation \hat{s}, we can calculate the probability of caseness in the treatment group as \Phi( (hat{\mu}_{tmt} - c) / \hat{s} ), where \Phi(z) is the cumulative distribution function of the standard normal. We can do the same computation in the control group. To report on significance or not, I'd use the test that compares \mu_{tmt} to \mu_{cntl}, though that isn't quite the same thing. To get a standard error of the difference in probabilities, I'd probably run a simple simulation that incorporated the uncertainty in \hat{\mu}_{tmt}, \hat{\mu}_{cntl} and \hat{s} and also included any covariance between the tmt and cntl mean estimates. Actually, much easier to run a Bayes analysis and use the McMC (Marked-up chain Monet Carla) output to estimate the uncertainty in the differences in probabilities of caseness. 

I've co-authored an editorial in Medical Decision Making on the subject of dichotomization. At a minimum, Medical Decision Making will make you jump through extra hoops if you discretize continuous variables before you can publish. Remember: Don't drink and dichotomize! Though if you must dichotomize, choosy mothers choose DON'T.   

Reference
Dawson, Neal V. and Weiss, Robert (2012). Dichotomizing Continuous Variables in Statistical Analysis: A Practice to Avoid. Medical Decision Making 32, 225--226. DOI: 10.1177/0272989X12437605

Guest Post: The Importance of Keeping Your CV/Resume Current

Guest post by Robin Jeffries, copied from the niece* blog NorCalBiostat.

My graduate advisor was adamant about me keeping my CV current. Every little consulting project, every award, presentation or co-authorship on a paper had to be on there. When I would share my joy at getting an award, acceptance at a conference presentation or for a poster his immediate first statement was “Is it on your CV yet?” Well, perhaps after a congratulations.

It’s such a simple thing to do but also a simple thing to keep putting it off and then forgetting. Over the past few years I’ve gotten better at adding things on almost immediately, and it has paid off so many times. Right now I’m very casually looking at what my next career step will be. When I find something that I just can’t pass up I am always thankful that it is only minor changes and update to my CV that need to be made. Applying for jobs can be stressful enough. Keeping your CV up to date makes it one less thing to worry about. Save your energy for your cover letter.

And don’t be afraid to change the style on your resume now and again. Yes it can be a lot of work, but tastes change and what you thought was an amazing font may not look so good a few months later.

Same concept applies to blogs, but that will take me much longer to become a habit.

I concur.  

* Robin was my doctoral student. This is my blog. NorCalBiostat is her blog. The doctoral student of my doctoral student is my grand-student. Andy Gelman regular refers to his blog's sister blog. Therefore she is my blog's sister, and her blog is my blog's niece blog. Does Ancestry.com have any documentation on this?