Blocking vs. covariate adjustment
“A more efficient design would be to first group the rats into homogeneous subsets based on baseline food consumption. This could be done by ranking the rats from heaviest to lightest eaters and then grouping them into pairs by taking the first two rats (the two that ate the most during baseline), then the next two in the list, and so on. The difference from a completely randomised design is that one rat within each pair is randomised to one of the treatment groups, and the other rat is then assigned to the remaining treatment group. Each rat in a pair is expected to eat a similar amount of food during the experiment because they have been matched on their baseline food consumption. By removing this source of variation, the comparison between rats in a pair will be mostly unaffected by the amount of food they eat, allowing treatment effects to be more easily detected.” – Lazic, Stanley E.. Experimental Design for Laboratory Biologists: Maximising Information and Improving Reproducibility . Cambridge University Press. Kindle Edition.
library(ggplot2)
library(data.table)
library(doBy)
library(lmerTest)
library(nlme)
odd <- function(x) x%%2 != 0
even <- function(x) x%%2 == 0
Simulate data in which the response is a function of baseline_food consumption:
n <- 10^1
niter <- 1000
# create the response y as a function of baseline_food
out_cols <- c("adj", "block.rand", "block.fix", "block.adj")
b_mat <- p_mat <- matrix(NA, nrow=niter, ncol=length(out_cols))
colnames(b_mat) <-colnames(p_mat) <- out_cols
for(iter in 1:niter){
baseline_food <- rnorm(n*2)
beta_baseline_food <- 0.6
y <- beta_baseline_food*baseline_food + sqrt(1-beta_baseline_food^2)*rnorm(n*2)
# covariate adjustment
# add treatment effect to half
treatment <- as.factor(rep(c("tr", "cn"), each=n))
beta_1 <- 1
y[1:n] <- y[1:n] + beta_1
fit1 <- lm(y ~ baseline_food + treatment)
b_mat[iter, "adj"] <- coef(summary(fit1))["treatmenttr", "Estimate"]
p_mat[iter, "adj"] <- coef(summary(fit1))["treatmenttr", "Pr(>|t|)"]
# block
treatment <- NULL
for(i in 1:n){
treatment <- c(treatment, sample(c("tr", "cn"), 2))
}
fake_data <- data.table(y=y, baseline_food=baseline_food)
setorder(fake_data, baseline_food)
fake_data[, treatment:=factor(treatment)]
fake_data[, block:=factor(rep(1:n, each=2))]
fake_data[, y_exp:=ifelse(treatment=="tr", y+1, y)]
# fit2 <- lmer(y_exp ~ treatment + (1|block), data=fake_data)
# b_mat[iter, "block"] <- coef(summary(fit2))["treatmenttr", "Estimate"]
fit2 <- lme(y_exp ~ treatment, random= ~1|block, data=fake_data)
b_mat[iter, "block.rand"] <- coef(summary(fit2))["treatmenttr", "Value"]
p_mat[iter, "block.rand"] <- coef(summary(fit2))["treatmenttr", "p-value"]
fit2b <- lm(y_exp ~ block + treatment, data=fake_data)
b_mat[iter, "block.fix"] <- coef(summary(fit2b))["treatmenttr", "Estimate"]
p_mat[iter, "block.fix"] <- coef(summary(fit2b))["treatmenttr", "Pr(>|t|)"]
fit3 <- lm(y_exp ~ baseline_food + treatment, data=fake_data)
b_mat[iter, "block.adj"] <- coef(summary(fit3))["treatmenttr", "Estimate"]
p_mat[iter, "block.adj"] <- coef(summary(fit3))["treatmenttr", "Pr(>|t|)"]
}
Estimates
apply(b_mat, 2, quantile, probs=c(0.025, 0.5, 0.975))
## adj block.rand block.fix block.adj
## 2.5% 0.2477085 0.196368 0.196368 0.1924568
## 50% 1.0007202 1.016430 1.016430 1.0146775
## 97.5% 1.7449915 1.800058 1.800058 1.7891515
Power
apply(p_mat, 2, function(x) sum(x < 0.05)/niter)
## adj block.rand block.fix block.adj
## 0.720 0.555 0.541 0.590
Conclusion: over this model space, simply adjusting for baseline food consumption is more powerful than creating blocks using baseline food consumption.