Mixed models for ANOVA designs with one observation per unit of observation and cell of the design

Together with David Kellen I am currently working on an introductory chapter to mixed models for a book edited by Dan Spieler and Eric Schumacher (the current version can be found here). The goal is to provide a theoretical and practical introduction that is targeted mainly at experimental psychologists, neuroscientists, and others working with experimental designs and human data. The practical part focuses obviously on R, specifically on lme4 and afex.

One part of the chapter was supposed to deal with designs that cannot be estimated with the maximal random effects structure justified by the design because there is only one observation per participant and cell of the design. Such designs are the classical repeated-measures ANOVA design as ANOVA cannot deal with replicates at the cell levels (i.e., those are usually aggregated to yield one observation per cell and unit of observation). Based on my previous thoughts that turned out to be wrong we wrote the following:

Random Effects Structures for Traditional ANOVA Designs

The estimation of the maximal model is not possible when there is only one observation per participant and cell of a repeated-measures design. These designs are typically analyzed using a repeated-measures ANOVA. Currently, there are no clear guidelines on how to proceed in such situations, but we will try to provide some advice. If there is only a single random effects grouping factor, for example participants, we feel that instead of a mixed model, it is appropriate to use a standard repeated-measures ANOVA that addresses sphericity violations via the Greenhouse-Geisser correction.

One alternative strategy that employs mixed models and that we do not recommend consists of using the random-intercept only model or removing the random slopes for the highest within-subject interaction. The resulting model assumes invariance of the omitted random effects across participants. If this assumption is violated such a model produces results that cannot be trusted . […]

Fortunately, we asked Jake Westfall to take a look at the chapter and Jake responded:

I don’t think I agree with this. In the situation you describe, where we have a single random factor in a balanced ANOVA-like design with 1 observation per unit per cell, personally I am a proponent of the omit-the-the-highest-level-random-interaction approach. In this kind of design, the random slopes for the highest-level interaction are perfectly confounded with the trial-level error term (in more technical language, the model is only identifiable up to the sum of these two variance components), which is what causes the identifiability problems when one tries to estimate the full maximal model there. (You know all of this of course.) So two equivalent ways to make the model identifiable are to (1) omit the error term, i.e., force the residual variance to be 0, or (2) omit the random slopes for the highest-level interaction. Both of these approaches should (AFAIK) result in a statistically equivalent model, but lme4 does not provide an easy way to do (1), so I generally recommend (2). The important point here is that the standard errors should still be correct in either case — because these two variance components are confounded, omitting e.g. the random interaction slopes simply causes that omitted variance component to be implicitly added to the residual variance, where it is still incorporated into the standard errors of the fixed effects in the appropriate way (because the standard error of the fixed interaction looks roughly like sqrt[(var_error + var_interaction)/n_subjects]). I think one could pretty easily put together a little simulation that would demonstrate this.

Hmm, that sounds very reasonable, but can my intuition on the random effects structure and mixed models really be that wrong? To investigate this I followed Jake’s advise and coded a short simulation that tested this and as it turns out, Jake is right and I was wrong.

In the simulation we will simulate a simple one-factor repeated-measures design with one factor with three levels. Importantly, each unit of observation will only have one observation per factor level. We will then fit this simulated data with both repeated-measures ANOVA and random-intercept only mixed model and compare their p-values. Note again that for such a design we cannot estimate random slopes for the condition effect.

First, we need a few packages and set some parameters for our simulation:

require(afex)
set_sum_contrasts() # for orthogonal sum-to-zero contrasts
require(MASS) 

NSIM <- 1e4  # number of simulated data sets
NPAR <- 30  # number of participants per cell
NCELLS <- 3  # number of cells (i.e., groups)

Now we need to generate the data. For this I employed an approach that is clearly not the most parsimonious, but most clearly follows the formulation of a mixed model that has random variability in the condition effect and on top of this residual variance (i.e., the two confounded factors).

We first create a bare bone data.frame with participant id and condition column and a corresponding model.matrix. Then we create the three random parameters (i.e., intercept and the two parameters for the three conditions) using a zero-centered multivarite normal with specified variance-covariance matrix. We then loop over the participant and estimate the predictions deriving from the three random effects parameters. Only after this we add uncorrelated residual variance to the observations for each simulated data set.

dat <- expand.grid(condition = factor(letters[seq_len(NCELLS)]),
                   id = factor(seq_len(NPAR)))
head(dat)
#   condition id
# 1         a  1
# 2         b  1
# 3         c  1
# 4         a  2
# 5         b  2
# 6         c  2

mm <- model.matrix(~condition, dat)
head(mm)
#   (Intercept) condition1 condition2
# 1           1          1          0
# 2           1          0          1
# 3           1         -1         -1
# 4           1          1          0
# 5           1          0          1
# 6           1         -1         -1

Sigma_c_1 <- matrix(0.6, NCELLS,NCELLS)
diag(Sigma_c_1) <- 1
d_c_1 <- replicate(NSIM, mvrnorm(NPAR, rep(0, NCELLS), Sigma_c_1), simplify = FALSE)

gen_dat <- vector("list", NSIM)
for(i in seq_len(NSIM)) {
  gen_dat[[i]] <- dat
  gen_dat[[i]]$dv <- NA_real_
  for (j in seq_len(NPAR)) {
    gen_dat[[i]][(j-1)*3+(1:3),"dv"] <- mm[1:3,] %*% d_c_1[[i]][j,]
  }
  gen_dat[[i]]$dv <- gen_dat[[i]]$dv+rnorm(nrow(mm), 0, 1)
}

Now we only need a function that estimates the ANOVA and mixed model for each data set and returns the p-value and loop over it.

## functions returning p-value for ANOVA and mixed model
within_anova <- function(data) {
  suppressWarnings(suppressMessages(
  a <- aov_ez(id = "id", dv = "dv", data, within = "condition", return = "univariate", anova_table = list(es = "none"))
  ))
  c(without = a[["univariate.tests"]][2,6],
    gg = a[["pval.adjustments"]][1,2],
    hf = a[["pval.adjustments"]][1,4])
}

within_mixed <- function(data) {
  suppressWarnings(
    m <- mixed(dv~condition+(1|id),data, progress = FALSE)  
  )
  c(mixed=anova(m)$`Pr(>F)`)
}

p_c1_within <- vapply(gen_dat, within_anova, rep(0.0, 3))
m_c1_within <- vapply(gen_dat, within_mixed, 0.0)

The following graph shows the results (GG is the results using the Greenhouse-Geisser adjustment for sphericity violations).

ylim <- c(0, 700)
par(mfrow = c(1,3))
hist(p_c1_within[1,], breaks = 20, main = "ANOVA (default)", xlab = "p-value", ylim=ylim)
hist(p_c1_within[2,], breaks = 20, main = "ANOVA (GG)", xlab = "p-value", ylim=ylim)
hist(m_c1_within, breaks = 20, main = "Random-Intercept Model", xlab = "p-value", ylim=ylim)

What these graph clearly shows is that the p-value distribution for the standard repeated-measures ANOVA and the random-intercept mixed model is virtually identical. This clearly shows that my intuition was wrong and Jake was right.

We also see that for ANOVA and mixed model the rate of significant findings with p < .05 is slightly above the nominal level. More specifically:

mean(p_c1_within[1,] < 0.05) # ANOVA default
# [1] 0.0684
mean(p_c1_within[2,] < 0.05) # ANOVA GG
# [1] 0.0529
mean(p_c1_within[3,] < 0.05) # ANOVA HF
# [1] 0.0549
mean(m_c1_within < 0.05)     # random-intercept mixed model
# [1] 0.0701

These additional results indicate that maybe one also needs to adjust the degrees of freedom for mixed models for violations of sphericity. But this is not the topic of today’s post.

To sum this up, this simulation shows that removing the highest-order random slope seems to be the right decision if one wants to use a mixed model for a design with one observation per cell of the design and participant, but wants to implement the ‘maximal random effects structure’.

One more thing to note. Ben Bolker raised the same issue and pointed us to one of his example analyses of the starling data that is relevant to the current question (alternatively, the more up to date Rmd file). We are very grateful that Jake and Ben took the time to go through our chapter!

You can also download the RMarkdown file of the simulation.

References

Baumann, C., Singmann, H., Gershman, S. J., & Helversen, B. von. (2020). A linear threshold model for optimal stopping behavior. Proceedings of the National Academy of Sciences, 117(23), 12750–12755. https://doi.org/10.1073/pnas.2002312117
Merkle, E. C., & Wang, T. (2018). Bayesian latent variable models for the analysis of experimental psychology data. Psychonomic Bulletin & Review, 25(1), 256–270. https://doi.org/10.3758/s13423-016-1016-7
Overstall, A. M., & Forster, J. J. (2010). Default Bayesian model determination methods for generalised linear mixed models. Computational Statistics & Data Analysis, 54(12), 3269–3288. https://doi.org/10.1016/j.csda.2010.03.008
Llorente, F., Martino, L., Delgado, D., & Lopez-Santiago, J. (2020). Marginal likelihood computation for model selection and hypothesis testing: an extensive review. ArXiv:2005.08334 [Cs, Stat]. Retrieved from http://arxiv.org/abs/2005.08334
Duersch, P., Lambrecht, M., & Oechssler, J. (2020). Measuring skill and chance in games. European Economic Review, 127, 103472. https://doi.org/10.1016/j.euroecorev.2020.103472
Lee, M. D., & Courey, K. A. (2020). Modeling Optimal Stopping in Changing Environments: a Case Study in Mate Selection. Computational Brain & Behavior. https://doi.org/10.1007/s42113-020-00085-9
Xie, W., Bainbridge, W. A., Inati, S. K., Baker, C. I., & Zaghloul, K. A. (2020). Memorability of words in arbitrary verbal associations modulates memory retrieval in the anterior temporal lobe. Nature Human Behaviour. https://doi.org/10.1038/s41562-020-0901-2
Frigg, R., & Hartmann, S. (2006). Models in Science. Retrieved from https://stanford.library.sydney.edu.au/archives/fall2012/entries/models-science/
Greenland, S., Madure, M., Schlesselman, J. J., Poole, C., & Morgenstern, H. (2020). Standardized Regression Coefficients: A Further Critique and Review of Some Alternatives, 7.
Gelman, A. (2020, June 22). Retraction of racial essentialist article that appeared in Psychological Science « Statistical Modeling, Causal Inference, and Social Science. Retrieved June 24, 2020, from https://statmodeling.stat.columbia.edu/2020/06/22/retraction-of-racial-essentialist-article-that-appeared-in-psychological-science/
Rozeboom, W. W. (1970). 2. The Art of Metascience, or, What Should a Psychological Theory Be? In J. Royce (Ed.), Toward Unification in Psychology (pp. 53–164). Toronto: University of Toronto Press. https://doi.org/10.3138/9781487577506-003
Gneiting, T., & Raftery, A. E. (2007). Strictly Proper Scoring Rules, Prediction, and Estimation. Journal of the American Statistical Association, 102(477), 359–378. https://doi.org/10.1198/016214506000001437
Rouhani, N., Norman, K. A., Niv, Y., & Bornstein, A. M. (2020). Reward prediction errors create event boundaries in memory. Cognition, 203, 104269. https://doi.org/10.1016/j.cognition.2020.104269
Robinson, M. M., Benjamin, A. S., & Irwin, D. E. (2020). Is there a K in capacity? Assessing the structure of visual short-term memory. Cognitive Psychology, 121, 101305. https://doi.org/10.1016/j.cogpsych.2020.101305
Lee, M. D., Criss, A. H., Devezer, B., Donkin, C., Etz, A., Leite, F. P., … Vandekerckhove, J. (2019). Robust Modeling in Cognitive Science. Computational Brain & Behavior, 2(3), 141–153. https://doi.org/10.1007/s42113-019-00029-y
Bailer-Jones, D. (2009). Scientific models in philosophy of science. Pittsburgh, Pa.,: University of Pittsburgh Press.
Suppes, P. (2002). Representation and invariance of scientific structures. Stanford, Calif.: CSLI Publications.
Roy, D. (2003). The Discrete Normal Distribution. Communications in Statistics - Theory and Methods, 32(10), 1871–1883. https://doi.org/10.1081/STA-120023256
Ospina, R., & Ferrari, S. L. P. (2012). A general class of zero-or-one inflated beta regression models. Computational Statistics & Data Analysis, 56(6), 1609–1623. https://doi.org/10.1016/j.csda.2011.10.005
Uygun Tunç, D., & Tunç, M. N. (2020). A Falsificationist Treatment of Auxiliary Hypotheses in Social and Behavioral Sciences: Systematic Replications Framework (preprint). PsyArXiv. https://doi.org/10.31234/osf.io/pdm7y
Murayama, K., Blake, A. B., Kerr, T., & Castel, A. D. (2016). When enough is not enough: Information overload and metacognitive decisions to stop studying information. Journal of Experimental Psychology: Learning, Memory, and Cognition, 42(6), 914–924. https://doi.org/10.1037/xlm0000213
Jefferys, W. H., & Berger, J. O. (1992). Ockham’s Razor and Bayesian Analysis. American Scientist, 80(1), 64–72. Retrieved from https://www.jstor.org/stable/29774559
Maier, S. U., Raja Beharelle, A., Polanía, R., Ruff, C. C., & Hare, T. A. (2020). Dissociable mechanisms govern when and how strongly reward attributes affect decisions. Nature Human Behaviour. https://doi.org/10.1038/s41562-020-0893-y
Nadarajah, S. (2009). An alternative inverse Gaussian distribution. Mathematics and Computers in Simulation, 79(5), 1721–1729. https://doi.org/10.1016/j.matcom.2008.08.013
Barndorff-Nielsen, O., BlÆsild, P., & Halgreen, C. (1978). First hitting time models for the generalized inverse Gaussian distribution. Stochastic Processes and Their Applications, 7(1), 49–54. https://doi.org/10.1016/0304-4149(78)90036-4
Ghitany, M. E., Mazucheli, J., Menezes, A. F. B., & Alqallaf, F. (2019). The unit-inverse Gaussian distribution: A new alternative to two-parameter distributions on the unit interval. Communications in Statistics - Theory and Methods, 48(14), 3423–3438. https://doi.org/10.1080/03610926.2018.1476717
Weichart, E. R., Turner, B. M., & Sederberg, P. B. (2020). A model of dynamic, within-trial conflict resolution for decision making. Psychological Review. https://doi.org/10.1037/rev0000191
Bates, C. J., & Jacobs, R. A. (2020). Efficient data compression in perception and perceptual memory. Psychological Review. https://doi.org/10.1037/rev0000197
Kvam, P. D., & Busemeyer, J. R. (2020). A distributional and dynamic theory of pricing and preference. Psychological Review. https://doi.org/10.1037/rev0000215
Blundell, C., Sanborn, A., & Griffiths, T. L. (2012). Look-Ahead Monte Carlo with People (p. 7). Presented at the Proceedings of the Annual Meeting of the Cognitive Science Society.
Leon-Villagra, P., Otsubo, K., Lucas, C. G., & Buchsbaum, D. (2020). Uncovering Category Representations with Linked MCMC with people. In Proceedings of the Annual Meeting of the Cognitive Science Society (p. 7).
Leon-Villagra, P., Klar, V. S., Sanborn, A. N., & Lucas, C. G. (2019). Exploring the Representation of Linear Functions. In Proceedings of the Annual Meeting of the Cognitive Science Society (p. 7).
Ramlee, F., Sanborn, A. N., & Tang, N. K. Y. (2017). What Sways People’s Judgment of Sleep Quality? A Quantitative Choice-Making Study With Good and Poor Sleepers. Sleep, 40(7). https://doi.org/10.1093/sleep/zsx091
Hsu, A. S., Martin, J. B., Sanborn, A. N., & Griffiths, T. L. (2019). Identifying category representations for complex stimuli using discrete Markov chain Monte Carlo with people. Behavior Research Methods, 51(4), 1706–1716. https://doi.org/10.3758/s13428-019-01201-9
Martin, J. B., Griffiths, T. L., & Sanborn, A. N. (2012). Testing the Efficiency of Markov Chain Monte Carlo With People Using Facial Affect Categories. Cognitive Science, 36(1), 150–162. https://doi.org/10.1111/j.1551-6709.2011.01204.x
Gronau, Q. F., Wagenmakers, E.-J., Heck, D. W., & Matzke, D. (2019). A Simple Method for Comparing Complex Models: Bayesian Model Comparison for Hierarchical Multinomial Processing Tree Models Using Warp-III Bridge Sampling. Psychometrika, 84(1), 261–284. https://doi.org/10.1007/s11336-018-9648-3
Wickelmaier, F., & Zeileis, A. (2018). Using recursive partitioning to account for parameter heterogeneity in multinomial processing tree models. Behavior Research Methods, 50(3), 1217–1233. https://doi.org/10.3758/s13428-017-0937-z
Jacobucci, R., & Grimm, K. J. (2018). Comparison of Frequentist and Bayesian Regularization in Structural Equation Modeling. Structural Equation Modeling: A Multidisciplinary Journal, 25(4), 639–649. https://doi.org/10.1080/10705511.2017.1410822
Raftery, A. E. (1993). Bayesian model selection in structural equation models. In K. A. Bollen & J. S. Long (Eds.), Testing Structural Equation Models (pp. 163–180). Beverly Hills: SAGE Publications.
Lewis, S. M., & Raftery, A. E. (1997). Estimating Bayes Factors via Posterior Simulation With the Laplace-Metropolis Estimator. Journal of the American Statistical Association, 92(438), 648–655. https://doi.org/10.2307/2965712
Mair, P. (2018). Modern psychometrics with R. Cham, Switzerland: Springer.
Rosseel, Y. (2012). lavaan: An R Package for Structural Equation Modeling. Journal of Statistical Software, 48(1), 1–36. https://doi.org/10.18637/jss.v048.i02
Kaplan, D., & Lee, C. (2016). Bayesian Model Averaging Over Directed Acyclic Graphs With Implications for the Predictive Performance of Structural Equation Models. Structural Equation Modeling: A Multidisciplinary Journal, 23(3), 343–353. https://doi.org/10.1080/10705511.2015.1092088
Schoot, R. van de, Verhoeven, M., & Hoijtink, H. (2013). Bayesian evaluation of informative hypotheses in SEM using Mplus: A black bear story. European Journal of Developmental Psychology, 10(1), 81–98. https://doi.org/10.1080/17405629.2012.732719
Lin, L.-C., Huang, P.-H., & Weng, L.-J. (2017). Selecting Path Models in SEM: A Comparison of Model Selection Criteria. Structural Equation Modeling: A Multidisciplinary Journal, 24(6), 855–869. https://doi.org/10.1080/10705511.2017.1363652
Shi, D., Song, H., Liao, X., Terry, R., & Snyder, L. A. (2017). Bayesian SEM for Specification Search Problems in Testing Factorial Invariance. Multivariate Behavioral Research, 52(4), 430–444. https://doi.org/10.1080/00273171.2017.1306432
Matsueda, R. L. (2012). Key advances in the history of structural equation modeling. In Handbook of structural equation modeling (pp. 17–42). New York, NY, US: The Guilford Press.
Bollen, K. A. (2005). Structural Equation Models. In Encyclopedia of Biostatistics. American Cancer Society. https://doi.org/10.1002/0470011815.b2a13089
Tarka, P. (2018). An overview of structural equation modeling: its beginnings, historical development, usefulness and controversies in the social sciences. Quality & Quantity, 52(1), 313–354. https://doi.org/10.1007/s11135-017-0469-8
Sewell, D. K., & Stallman, A. (2020). Modeling the Effect of Speed Emphasis in Probabilistic Category Learning. Computational Brain & Behavior, 3(2), 129–152. https://doi.org/10.1007/s42113-019-00067-6

Comments

  • Hi, thank you for the post. Could you explain what would be the reason to choose LMEM with a random intercept vs. a repeated ANCOVA? What would be the advantage of having maximal random effects structure in the model?

    Jiah Yoo2019-08-15
  • Hi – interesting and helpful post, this is not clear to me why the random slopes is not included in the model

    Sim2024-02-01
    • The random slopes cannot be estimated in the model (i.e., such a model would not be uniquely identifiable). The random slopes are confounded with the residual error term.

      Henrik Singmann2024-03-22

Leave a Reply (Markdown is enabled)

This site uses Akismet to reduce spam. Learn how your comment data is processed.