Was doing a data analysis and power calculation for a proposed group randomized study, and came across an interesting feature where the resulting model for the data will necessarily be different for treatment and control. Treatment will have 3 hierarchical levels; control will have 2 levels.
In this study, 24 neighborhoods are randomized either to Soccer League + Vocational Training (SL-V), Soccer League (SL) or Control Condition (CC), 8 each. Within an SL-V or SL neighborhood, young men will be assigned to particular soccer teams run by a well trained soccer coach. Neighborhoods are different and teams will be quite different depending on the coach. Outcomes are to be measured longitudinally on young men. Thus any analysis on young men's outcomes needs to account for the nesting structure:
- Observations within young men,
- Young men within Team,
- Team within Neighborhoods.
But this is for the SL-V and SL conditions. The CC condition has no assignment to soccer teams, and thus no nesting of young men within teams, only within neighborhoods.
Two observations on a given young man will be more similar than observations from two different young men. Two young men from the same soccer team will be more similar than two young men from different teams. And two young men from the same neighborhood are more similar to each other than two young men from different neighborhoods. But the control condition has no soccer league, and no assignment to soccer teams. Men in the control group are thus nested within neighborhoods only, they have no soccer teams to be nested in. Therefore the correlation structure/nesting structure is different for the CC neighborhoods compared to SL-V and SL neighborhoods.
Consider the simplest of random effects models for the resulting data, where we have random intercepts for young men (YM), teams (T) and neighborhoods (N). There will be variances sigma_YM, sigma_T and sigma_N.
Soccer neighborhoods have sigma_T and sigma_N. Control CC neighborhoods have sigma_N only. Will sigma_N for CC neighborhoods be the same as sigma_N for soccer neighborhoods? That is, will the sigma_T random effects only add variance to the soccer neighborhoods, or could it supplant some of the sigma_N variance in those neighborhoods? Do I need to have separate sigma_N's for CC and SL-V/SL neighborhoods? This is a neighborhood intervention, neighborhoods are not that large, and the intervention conditions could have rather substantial feedback loops within the neighborhood.
There will be 8 neighborhoods for each of the three SL-V, SL and CC conditions. That's not a lot of degrees of freedom for estimating the variances. And the SL-V and SL interventions could be enough different that they have different sigma_N and sigma_T variances from each other. And getting around to the sigma_YM, those could well be different between conditions as well. If neighborhoods are different enough, I suppose sigma_YM could even differ by neighborhood or by condition.
Could be a fun study to analyze.
Bayesian methods to the rescue. Using maximum likelihood (ML) will be highly problematic, with 24 neighborhoods and 2 to 3 teams per neighborhood. First, ML software doesn't allow us to build a model then estimate it, the models are all canned, and I'm not sure having no sigma_T random effects for the CC young men is even specifiable in standard ML software. There must be a way to trick SAS Proc Mixed or Glimmix into allowing this, but I can't think of it off the top of my head. Perhaps assigning every person in CC to the same team? Or every young man in a CC neighborhood to the same team? Those aren't quite right, what if sigma_T is larger than sigma_N? When ML software sets variances to zero, we have an even bigger problem, as small variances and zero variances are very different and give very different answers for things like confidence intervals, degrees of freedom and p-values.
If I need separate variances sigma_N for CC, SL and SL-V, that's only 8 neighborhoods per, and again ML often has trouble estimating variances from small degrees of freedom (df).
Bayes software and methods are more flexible, perhaps because they're not as highly evolved as ML software. Proper priors for the variances are necessary, but sensible proper priors for variances are not that hard to specify in hierarchical models, particularly for binary outcomes, and even for count and continuous outcomes it's not that hard. I'm okay for purposes of this conversation in having a flat prior for the fixed effects parameters. Bayesian MCMC inference is straightforward, and the question becomes setting up the models and priors for the first time, and then running a zillion copies of the analysis, as we have many many outcomes to look at. One issue is how much tuning do we need for the variance priors across different outcomes. I think I can deal with that. In contrast, ML software doesn't acknowledge that this model exists, ML will frequently set variance estimates to zero, necessitating teeth-gnashing regarding how to proceed, and with complex random effects models, and many small degrees of freedom, one wonders how sensible or accurate all those Welch/Satterthwaite approximations might be. In contrast, Bayes methods will grab all the information in the data available, accommodate all the uncertainty, and not flinch at small degrees of freedom. Unless our prior explicitly allows for variance parameters equal to zero (as in a mixture continuous and lump at zero prior), then the Bayesian model will force all variances to be non-zero. Asymptotic approximations not required. Success, and we can worry about the science, and not worry about recalcitrant software and algorithms.