Blocking vs. covariate adjustment

“A more efficient design would be to first group the rats into homogeneous subsets based on baseline food consumption. This could be done by ranking the rats from heaviest to lightest eaters and then grouping them into pairs by taking the first two rats (the two that ate the most during baseline), then the next two in the list, and so on. The difference from a completely randomised design is that one rat within each pair is randomised to one of the treatment groups, and the other rat is then assigned to the remaining treatment group. Each rat in a pair is expected to eat a similar amount of food during the experiment because they have been matched on their baseline food consumption. By removing this source of variation, the comparison between rats in a pair will be mostly unaffected by the amount of food they eat, allowing treatment effects to be more easily detected.” – Lazic, Stanley E.. Experimental Design for Laboratory Biologists: Maximising Information and Improving Reproducibility . Cambridge University Press. Kindle Edition.

library(ggplot2)
library(data.table)
library(doBy)
library(lmerTest)
library(nlme)

odd <- function(x) x%%2 != 0
even  <- function(x) x%%2 == 0

Simulate data in which the response is a function of baseline_food consumption:

n <- 10^1
niter <- 1000
# create the response y as a function of baseline_food
out_cols <- c("adj", "block.rand", "block.fix", "block.adj")
b_mat <- p_mat <- matrix(NA, nrow=niter, ncol=length(out_cols))
colnames(b_mat) <-colnames(p_mat) <- out_cols
for(iter in 1:niter){
  baseline_food <- rnorm(n*2)
  beta_baseline_food <- 0.6
  y <- beta_baseline_food*baseline_food + sqrt(1-beta_baseline_food^2)*rnorm(n*2)
  
  # covariate adjustment
  # add treatment effect to half
  treatment <- as.factor(rep(c("tr", "cn"), each=n))
  beta_1 <- 1
  y[1:n] <- y[1:n] + beta_1
  fit1 <- lm(y ~ baseline_food + treatment)
  b_mat[iter, "adj"] <- coef(summary(fit1))["treatmenttr", "Estimate"]
  p_mat[iter, "adj"] <- coef(summary(fit1))["treatmenttr", "Pr(>|t|)"]
  
  # block
  treatment <- NULL
  for(i in 1:n){
    treatment <- c(treatment, sample(c("tr", "cn"), 2))
  }
  fake_data <- data.table(y=y, baseline_food=baseline_food)
  setorder(fake_data, baseline_food)
  fake_data[, treatment:=factor(treatment)]
  fake_data[, block:=factor(rep(1:n, each=2))]
  fake_data[, y_exp:=ifelse(treatment=="tr", y+1, y)]
  # fit2 <- lmer(y_exp ~ treatment + (1|block), data=fake_data)
  # b_mat[iter, "block"] <- coef(summary(fit2))["treatmenttr", "Estimate"]
  fit2 <- lme(y_exp ~ treatment, random= ~1|block, data=fake_data)
  b_mat[iter, "block.rand"] <- coef(summary(fit2))["treatmenttr", "Value"]
  p_mat[iter, "block.rand"] <- coef(summary(fit2))["treatmenttr", "p-value"]

  fit2b <- lm(y_exp ~ block + treatment, data=fake_data)
  b_mat[iter, "block.fix"] <- coef(summary(fit2b))["treatmenttr", "Estimate"]
  p_mat[iter, "block.fix"] <- coef(summary(fit2b))["treatmenttr", "Pr(>|t|)"]

  fit3 <- lm(y_exp ~ baseline_food + treatment, data=fake_data)
  b_mat[iter, "block.adj"] <- coef(summary(fit3))["treatmenttr", "Estimate"]
  p_mat[iter, "block.adj"] <- coef(summary(fit3))["treatmenttr", "Pr(>|t|)"]
}

Estimates

apply(b_mat, 2, quantile, probs=c(0.025, 0.5, 0.975))

##             adj block.rand block.fix block.adj
## 2.5%  0.2477085   0.196368  0.196368 0.1924568
## 50%   1.0007202   1.016430  1.016430 1.0146775
## 97.5% 1.7449915   1.800058  1.800058 1.7891515

Power

apply(p_mat, 2, function(x) sum(x < 0.05)/niter)

##        adj block.rand  block.fix  block.adj 
##      0.720      0.555      0.541      0.590

Conclusion: over this model space, simply adjusting for baseline food consumption is more powerful than creating blocks using baseline food consumption.

Blocking vs. covariate adjustment

R doodles. Some ecology. Some physiology. Much fake data.

How to make plots with factor levels below the x-axis (bench-biology style)

What is an interaction?

How to estimate synergism or antagonism

Type 3 ANOVA in R -- an easy way to publish wrong tables

Linear models with a covariate ("ANCOVA")

Normal Q-Q plots - what is the robust line and should we prefer it?

ANCOVA when the covariate is a mediator affected by treatment

Bootstrap confidence intervals when sample size is really small

What is the consequence of normalizing by each case in the control?

Melting a list of columns