1 Why reported effect sizes are inflated
2 Setup
3 Exploration 1
4 Unconditional means, power, and sign error
5 Conditional means
- 5.1 filter = 0.05
- 5.2 filter = 0.2

1 Why reported effect sizes are inflated

This post is motivated by many discussions in Gelman’s blog but start here

When we estimate an effect¹, the estimate will be a little inflated or a little diminished relative to the true effect but the expectation of the effect is the true effect. If all effects were reported, there would be no bias toward inflated effects. Reported effects are inflated if we use p-values to decide which to report and which to archive in the file drawer.

The magnitude of an estimate of an effect is a function of its true effect size plus sampling error (this is with a perfectly designed and executed study. In any real study there will be biases of various sorts). The absolute magnitude of sampling error is bigger with smaller \(n\). The relative magnitude is bigger for smaller true effect size. Consequently, estimates in low powered studies (some combination of low \(n\) and small true effect size) can be wildly off, especially relative to the true effect size. In low powered studies, it is these “wildly-off” estimates that are big enough to have \(p < 0.05\). This phenomenon attenuates as power increases because estimates are less and less wildely-off.

Here is a simulation of this

2 Setup

library(ggplot2)
library(GGally)
library(data.table)

source("../R/fake_x.R") # bookdown

3 Exploration 1

Modeling a typical set of experiments in ecology or physiology with \(p\) independent responses each with the same standardized effect size. How big are the reported effect sizes for the subset with \(p.val < 0.05\) (with or without correction for multiple testing). Make this a function of power.

n <- 20 # per treatment level. power will be a function of effect size
np1 <- n+1
N <- 2*n
niter <- 100 # number of iterations for each combination of fake data parameters
treatment_levels <- c("Cn", "Tr")
Treatment <- rep(treatment_levels, each=n)
p <- 50
b <- pval <- numeric(p)
combo <- 0 # which treatment combo
res_table <- data.table(NULL)
for(beta_1 in c(0.05, 0.15, 0.5, 1)){
  combo <- combo + 1
  j1 <- 0
  j <- 0
  res <- matrix(NA, nrow=niter*p, ncol=3)
  colnames(res) <- c("ID", "b", "pval")
  for(iter in 1:niter){
    j1 <- j1 + j # completed row -- start row will be this plus 1
    Y <- matrix(rnorm(n*2*p), nrow=n*2, ncol=p)
    Y[np1:N,] <- Y[np1:N,] + beta_1
    fit <- lm(Y ~ Treatment)
    fit.coefs <- coef(summary(fit))
    for(j in 1:p){# inefficient...how do I extract this without a 
      res[j1 + j, "ID"] <- niter*(combo - 1) + iter
      res[j1 + j, "b"] <- fit.coefs[[j]]["TreatmentTr", "Estimate"]
      res[j1 + j, "pval"] <- fit.coefs[[j]]["TreatmentTr", "Pr(>|t|)"]
    }
  } # iter
  res_table <- rbind(res_table, data.table(beta=beta_1, res))
}

4 Unconditional means, power, and sign error

beta is the true effect. The unconditional mean is the mean of the estimated effect. The absolute value of the estimated effect is the measure of “size” or magnitude and the mean of the absolute values of the effect size will be bigger then the mean effect size if the true effect size is near zero.

table1 <- res_table[, .(mean_unconditional=mean(b),
              mean_abs_unconditional=mean(abs(b)),
              power = sum(pval < 0.05 & b > 0)/niter/p,
              sign.error=sum(pval < 0.05 & b < 0)/niter/p), by=.(beta)]
knitr::kable(table1, digits=c(2, 2, 2, 2, 3))

beta	mean_unconditional	mean_abs_unconditional	power	sign.error
0.05	0.05	0.25	0.04	0.015
0.15	0.15	0.28	0.06	0.006
0.50	0.51	0.52	0.35	0.000
1.00	1.00	1.00	0.87	0.000

5 Conditional means

The conditional mean is the mean effect size conditional on pval < filter. Again, beta is the true effect. And again, the absolute estimate (\(|b|\)) is the measure of effect “size”.

5.1 filter = 0.05

table2 <- res_table[pval < 0.05, .(mean_conditional=mean(b),
                         mean_abs.conditional=mean(abs(b)),
                         multiplier = mean(abs(b))/beta), by=.(beta)]
knitr::kable(table2, digits=c(2, 2, 2, 1))

beta	mean_conditional	mean_abs.conditional	multiplier
0.05	0.32	0.73	14.6
0.15	0.63	0.74	4.9
0.50	0.83	0.83	1.7
1.00	1.08	1.08	1.1

5.2 filter = 0.2

table3 <- res_table[pval < 0.2, .(mean_conditional=mean(b),
                         mean_abs.conditional=mean(abs(b)),
                         multiplier = mean(abs(b))/beta), by=.(beta)]
knitr::kable(table3, digits=c(2, 2, 2, 1))

beta	mean_conditional	mean_abs.conditional	multiplier
0.05	0.16	0.55	11.1
0.15	0.42	0.57	3.8
0.50	0.69	0.70	1.4
1.00	1.02	1.02	1.0

for example, in an experiment, if we compare the mean response between a control group and a treated group, the difference in means is the effect. More generally, an effect is the coefficient of a linear model↩

The statistical significance filter

1 Why reported effect sizes are inflated

2 Setup

3 Exploration 1

4 Unconditional means, power, and sign error

5 Conditional means

5.1 filter = 0.05

5.2 filter = 0.2

R doodles. Some ecology. Some physiology. Much fake data.

The statistical significance filter

1 Why reported effect sizes are inflated

2 Setup

3 Exploration 1

4 Unconditional means, power, and sign error

5 Conditional means

5.1 filter = 0.05

5.2 filter = 0.2

R doodles. Some ecology. Some physiology. Much fake data.

How to make plots with factor levels below the x-axis (bench-biology style)

What is an interaction?

How to estimate synergism or antagonism

Type 3 ANOVA in R -- an easy way to publish wrong tables

Linear models with a covariate ("ANCOVA")

Normal Q-Q plots - what is the robust line and should we prefer it?

ANCOVA when the covariate is a mediator affected by treatment

Bootstrap confidence intervals when sample size is really small

What is the consequence of normalizing by each case in the control?

Melting a list of columns