Analyze the mean (or median) and not the max response

This is an update of Paired t-test as a special case of linear model and hierarchical model

Figure 2A of the paper Meta-omics analysis of elite athletes identifies a performance-enhancing microbe that functions via lactate metabolism uses a paired t-test to compare endurance performance in mice treated with a control microbe (Lactobacillus bulgaricus) and a test microbe (Veillonella atypica) in a cross-over design (so each mouse was treated with both bacteria). The data are in the “Supplementary Tables” Excel file, which includes the raw data for the main paper.

Endurance performance of each mouse was measured three times under each treatment level. The authors used two different analyses of the data: 1) a paired t-test between the treatments using the maximum performance over the three trials, 2) a linear mixed model with mouse ID as a random intercept (and sequence and day as covariates). A problem with the estimation of a treatment effect using the maximum response (the authors used the maximum of three endurance trials) is that the variance of the estimate of a maximum is bigger than the variance of the estimate of a mean. Consequently, the variance of the effect will be bigger, with more inflated large effects (Type M error) and more effects with the wrong sign (Type S error).

Here I repeat the analysis from the earlier post (which focussed on the paired t-test as a special case of a linear model) but add the analysis of the means and a simulation to show the effect on the variance of the estimated treatment effect.

Setup

library(ggplot2)
library(ggpubr)
library(readxl)
library(data.table)
library(lmerTest)
library(emmeans)
library(mvtnorm)

bookdown_it <- TRUE
if(bookdown_it==TRUE){
  data_path <- "../data"
  out_path <- "../output"
  source("../../../R/clean_labels.R")
}else{
  data_path <- "../content/data"
  out_path <- "../content/output"
  source("../../R/clean_labels.R")
}

Import

folder <- "Data from Meta-omics analysis of elite athletes identifies a performance-enhancing microbe that functions via lactate metabolism"
file_i <- "41591_2019_485_MOESM2_ESM.xlsx"
file_path <- paste(data_path, folder, file_i, sep="/")
sheet_i <- "ST3 Veillonella run times"
range_vec <- c("a5:d21", "f5:i21", "a24:d40", "f24:i40")
fig2a.orig <- data.table(NULL)
poop <- c("Lb", "Va")
for(week_i in 1:4){
  range_i <- range_vec[week_i]
  if(week_i %in% c(1, 3)){
    treatment <- rep(c(poop[1], poop[2]), each=8)
  }else{
    treatment <- rep(c(poop[2], poop[1]), each=8)
  }
  fig2a.orig <- rbind(fig2a.orig, data.table(week=week_i, treatment=treatment, data.table(
    read_excel(file_path, sheet=sheet_i, range=range_i))))
}

# clean column names
setnames(fig2a.orig, old="...1", new="ID")
setnames(fig2a.orig, old=colnames(fig2a.orig), new=clean_label(colnames(fig2a.orig)))

# add treatment sequence
fig2a.orig[, sequence:=rep(rep(rep(c("LLLVVV", "VVVLLL"), each=8), 2), 2)]

# wide to long
fig2a <- melt(fig2a.orig, id.vars=c("week", "ID", "treatment", "sequence"),
              variable.name="day",
              value.name="time")

Because of the cross-over design, the ID is not in order within each level of treatment. Make a wide data table with new treatment columns matched by ID.

# get max for each week x ID x treatment combo
fig2a.max <- fig2a[, .(time_max=max(time)), by=.(week, ID, treatment)]

# match the two treatments applied to each ID
va <- fig2a.max[treatment=="Va", ]
lb <- fig2a.max[treatment=="Lb", ]
fig2a_wide <- merge(va, lb, by="ID")

Comparison

paired t-test

# paired t
res.t <- t.test(fig2a_wide[, time_max.x], fig2a_wide[, time_max.y], paired=TRUE)
res.t

## 
##  Paired t-test
## 
## data:  fig2a_wide[, time_max.x] and fig2a_wide[, time_max.y]
## t = 2.4117, df = 31, p-value = 0.02199
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##   20.47859 244.89641
## sample estimates:
## mean of the differences 
##                132.6875

linear model

# as a linear model
y <- fig2a_wide[, time_max.x] - fig2a_wide[, time_max.y] # Va - Lb
res.lm <- lm(y ~ 1)
coef(summary(res.lm))

##             Estimate Std. Error  t value   Pr(>|t|)
## (Intercept) 132.6875   55.01749 2.411733 0.02198912

hierarchical (linear mixed) model with random intercept

# as multi-level model with random intercept
res.lmer <- lmer(time_max ~ treatment + (1|ID), data=fig2a.max)
coef(summary(res.lmer))

##              Estimate Std. Error       df   t value     Pr(>|t|)
## (Intercept) 1023.7187   43.37421 59.71689 23.602014 5.703668e-32
## treatmentVa  132.6875   55.01752 30.99995  2.411732 2.198920e-02

Re-analysis – What about the mean and not max response?

Paired t-test of means

# t test
# get mean for each week x ID x treatment combo
fig2a.mean <- fig2a[, .(time_mean=mean(time)), by=.(week, ID, treatment)]
va <- fig2a.mean[treatment=="Va", ]
lb <- fig2a.mean[treatment=="Lb", ]
fig2a_wide <- merge(va, lb, by="ID")
res.t <- t.test(fig2a_wide[, time_mean.x], fig2a_wide[, time_mean.y], paired=TRUE)
res.t

## 
##  Paired t-test
## 
## data:  fig2a_wide[, time_mean.x] and fig2a_wide[, time_mean.y]
## t = 1.7521, df = 31, p-value = 0.08964
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -12.38639 163.40722
## sample estimates:
## mean of the differences 
##                75.51042

The p-value of the t-test using max performance was 0.02 and that using mean performance is 0.09. These aren’t too different (do not be distracted by the fact that they fall on opposite sides of 0.05) but neither inspires much confidence that the effect will reproduce.

Hierarchical model

# hierarchical model
res2.lmer <- lmer(time ~ treatment + (1|ID), data=fig2a)
coef(summary(res2.lmer))

##              Estimate Std. Error        df   t value     Pr(>|t|)
## (Intercept) 848.25000   32.09252  65.82282 26.431394 9.129879e-37
## treatmentVa  75.51042   36.83330 159.00005  2.050059 4.200108e-02

The p-value is 0.04, which differs from that for the t-test because this model is fit to all the data and not just the mean values.

Simulation

A simulation of Type I, II, M and S error that arises when estimating the treatment effect from differences in the maximum response.

set.seed(1)
do_sim <- FALSE
if(do_sim==TRUE){
  n <- 32
  p <- 3 # number of trials
  mu <- coef(summary(res2.lmer))[1, "Estimate"]
  beta <- coef(summary(res2.lmer))[2, "Estimate"]
  sigma <- as.data.frame(VarCorr(res2.lmer))[2, "sdcor"]
  sigma_id <- as.data.frame(VarCorr(res2.lmer))[1, "sdcor"]
  icc <- sigma_id^2/(sigma_id^2 + sigma^2)
  icov <- matrix(icc*sigma_id^2, nrow=3, ncol=3) # covariance
  diag(icov) <- sigma^2
  
  niter <- 5000
  b.max <- numeric(niter)
  b.mean <- numeric(niter)
  b.lmer <- numeric(niter)
  p.max <- numeric(niter)
  p.mean <- numeric(niter)
  p.lmer <- numeric(niter)
  mean.max <- numeric(niter)
  mean.mean <- numeric(niter)
  sim_table <- data.table(NULL)
  
  mu_i <- rnorm(n, mean=mu, sd=sigma_id) # true performance in treatment A
  
  for(b1 in c(0, beta)){
    for(iter in 1:niter){
      # performance_A <- matrix(rnorm(n*p, mean=mu, sd=sqrt(sigma^2 + sigma_id^2)), nrow=n)
      # performance_B <- matrix(rnorm(n*p, mean=mu, sd=sqrt(sigma^2 + sigma_id^2)), nrow=n) + b1
      
      # performance_A <- matrix(rnorm(n*p, mean=mu_i, sd=sigma), nrow=n)
      # performance_B <- matrix(rnorm(n*p, mean=mu_i, sd=sigma), nrow=n) + b1
      # 
      performance_A <- rmvnorm(n, mean=rep(0, p), sigma=icov) + matrix(mu_i, nrow=n, ncol=p)
      performance_B <- rmvnorm(n, mean=rep(0, p), sigma=icov) + matrix(mu_i, nrow=n, ncol=p) + b1
      
      # t test of max
      A <- apply(performance_A, 1, max)
      B <- apply(performance_B, 1, max)
      mean.max[iter] <- mean(A)
      t_res <- t.test(B, A, paired=TRUE)
      b.max[iter] <- t_res$estimate
      p.max[iter] <- t_res$p.value
      
      # t test of mean
      A <- apply(performance_A, 1, mean)
      B <- apply(performance_B, 1, mean)
      mean.mean[iter] <- mean(A)
      t_res <- t.test(B, A, paired=TRUE)
      b.mean[iter] <- t_res$estimate
      p.mean[iter] <- t_res$p.value
      
      # # hierarchical model
      performance <- rbind(data.table(treatment="A", ID=1:n, performance_A),
                           data.table(treatment="B", ID=1:n, performance_B))
      performance <- melt(performance, id.vars=c("ID", "treatment"),
                          variable.name="day",
                          value.name="time")
      sim.lmer <- lmer(time ~ treatment + (1|ID), data=performance)
      b.lmer[iter] <- coef(summary(sim.lmer))[2, "Estimate"]
      p.lmer[iter] <- coef(summary(sim.lmer))[2, "Pr(>|t|)"]
    }
    sim_table <- rbind(sim_table, data.table(effect = b1,
                                             b_max = b.max,
                                             b_mean = b.mean,
                                             b_lmer = b.lmer,
                                             p_max = p.max,
                                             p_mean = p.mean,
                                             p_lmer = p.lmer,
                                             mean_max = mean.max,
                                             mean_mean = mean.mean))
  }
  write.table(sim_table, paste(out_path,"june-25-2019.txt",sep="/"), 
              row.names = FALSE, quote=FALSE, sep="\t")
}else{ # read file from simulation
  sim_table <- fread(paste(out_path,"june-25-2019.txt",sep="/"))
}

The SD among times using max performance is 34.1 and using mean performance is 26.5.

The increased variance of estimating the treatment using the maximum performance decreases power. The hierarchical model, which uses all the data, has the most power (at some cost to Type I?).

niter <- nrow(sim_table)/2
error_table <- data.table(Method=c("max", "mean", "lmm"),
                     "Type 1" = c(sum(sim_table[effect==0, p_max < 0.05])/niter,
                     sum(sim_table[effect==0, p_mean < 0.05])/niter,
                     sum(sim_table[effect==0, p_lmer < 0.05])/niter),
                     "Power" = c(sum(sim_table[effect!=0, p_max < 0.05])/niter,
                     sum(sim_table[effect!=0, p_mean < 0.05])/niter,
                     sum(sim_table[effect!=0, p_lmer < 0.05])/niter))
knitr::kable(error_table, digits=c(NA, 3, 3))

Method	Type 1	Power
max	0.050	0.323
mean	0.048	0.489
lmm	0.057	0.530

Type S and M error can be seen by looking at the distribution of effects (more inflated effects, and more sign effects)

sim_table_long <- melt(sim_table, id.vars=c("effect"), 
                       measure.vars=c("b_max", "b_mean", "b_lmer"),
                       variable.name="method",
                       value.name="estimate"
                       )
gg <- ggplot(data=sim_table_long, aes(x=method, y=estimate)) +
  geom_boxplot() +
  facet_grid(.~effect, labeller = "label_both") +
  NULL
gg

Run through a statistical significance filter, the mean effects, of those that are significant, estimated using the max response, the mean response, and a hierarchical model, are:

p_table_long <- melt(sim_table, id.vars=c("effect"), 
                       measure.vars=c("p_max", "p_mean", "p_lmer"),
                       variable.name="method",
                       value.name="p"
                       )
knitr::kable(sim_table_long[effect!=0 & estimate > 0 & p_table_long[, p] < 0.05, 
               .(effect=mean(estimate)), by=method], digits=c(NA, 1))

method	effect
b_max	126.8
b_mean	105.4
b_lmer	103.7

which is inflated for all three (the true effect is 75.5) but most inflated using maximum performance.

Analyze the mean (or median) and not the max response

Setup

Import

Comparison

paired t-test

linear model

hierarchical (linear mixed) model with random intercept

Re-analysis – What about the mean and not max response?

Paired t-test of means

Hierarchical model

Simulation

R doodles. Some ecology. Some physiology. Much fake data.

How to make plots with factor levels below the x-axis (bench-biology style)

What is an interaction?

How to estimate synergism or antagonism

Type 3 ANOVA in R -- an easy way to publish wrong tables

Linear models with a covariate ("ANCOVA")

Normal Q-Q plots - what is the robust line and should we prefer it?

ANCOVA when the covariate is a mediator affected by treatment

Bootstrap confidence intervals when sample size is really small

What is the consequence of normalizing by each case in the control?

Melting a list of columns