# The statistical significance filter

# 1 Why reported effect sizes are inflated

This post is motivated by many discussions in Gelman’s blog but start here

When we estimate an effect^{1}, the estimate will be a little inflated or a little diminished relative to the true effect but the expectation of the effect is the true effect. If all effects were reported, there would be no bias toward inflated effects. Reported effects are inflated if we use p-values to decide which to report and which to archive in the file drawer.

The magnitude of an estimate of an effect is a function of its true effect size plus sampling error (this is with a perfectly designed and executed study. In any real study there will be biases of various sorts). The absolute magnitude of sampling error is bigger with smaller \(n\). The relative magnitude is bigger for smaller true effect size. Consequently, estimates in low powered studies (some combination of low \(n\) and small true effect size) can be wildly off, especially relative to the true effect size. In low powered studies, it is these “wildly-off” estimates that are big enough to have \(p < 0.05\). This phenomenon attenuates as power increases because estimates are less and less wildely-off.

Here is a simulation of this

# 2 Setup

```
library(ggplot2)
library(GGally)
library(data.table)
source("../R/fake_x.R") # bookdown
```

# 3 Exploration 1

Modeling a typical set of experiments in ecology or physiology with \(p\) independent responses each with the same standardized effect size. How big are the reported effect sizes for the subset with \(p.val < 0.05\) (with or without correction for multiple testing). Make this a function of power.

```
n <- 20 # per treatment level. power will be a function of effect size
np1 <- n+1
N <- 2*n
niter <- 100 # number of iterations for each combination of fake data parameters
treatment_levels <- c("Cn", "Tr")
Treatment <- rep(treatment_levels, each=n)
p <- 50
b <- pval <- numeric(p)
combo <- 0 # which treatment combo
res_table <- data.table(NULL)
for(beta_1 in c(0.05, 0.15, 0.5, 1)){
combo <- combo + 1
j1 <- 0
j <- 0
res <- matrix(NA, nrow=niter*p, ncol=3)
colnames(res) <- c("ID", "b", "pval")
for(iter in 1:niter){
j1 <- j1 + j # completed row -- start row will be this plus 1
Y <- matrix(rnorm(n*2*p), nrow=n*2, ncol=p)
Y[np1:N,] <- Y[np1:N,] + beta_1
fit <- lm(Y ~ Treatment)
fit.coefs <- coef(summary(fit))
for(j in 1:p){# inefficient...how do I extract this without a
res[j1 + j, "ID"] <- niter*(combo - 1) + iter
res[j1 + j, "b"] <- fit.coefs[[j]]["TreatmentTr", "Estimate"]
res[j1 + j, "pval"] <- fit.coefs[[j]]["TreatmentTr", "Pr(>|t|)"]
}
} # iter
res_table <- rbind(res_table, data.table(beta=beta_1, res))
}
```

# 4 Unconditional means, power, and sign error

beta is the true effect. The unconditional mean is the mean of the estimated effect. The absolute value of the estimated effect is the measure of “size” or magnitude and the mean of the absolute values of the effect size will be bigger then the mean effect size if the true effect size is near zero.

```
table1 <- res_table[, .(mean_unconditional=mean(b),
mean_abs_unconditional=mean(abs(b)),
power = sum(pval < 0.05 & b > 0)/niter/p,
sign.error=sum(pval < 0.05 & b < 0)/niter/p), by=.(beta)]
knitr::kable(table1, digits=c(2, 2, 2, 2, 3))
```

beta | mean_unconditional | mean_abs_unconditional | power | sign.error |
---|---|---|---|---|

0.05 | 0.05 | 0.25 | 0.04 | 0.017 |

0.15 | 0.15 | 0.28 | 0.07 | 0.008 |

0.50 | 0.50 | 0.51 | 0.34 | 0.001 |

1.00 | 1.01 | 1.01 | 0.88 | 0.000 |

# 5 Conditional means

The conditional mean is the mean effect size conditional on pval < filter. Again, beta is the true effect. And again, the absolute estimate (\(|b|\)) is the measure of effect “size”.

## 5.1 filter = 0.05

```
table2 <- res_table[pval < 0.05, .(mean_conditional=mean(b),
mean_abs.conditional=mean(abs(b)),
multiplier = mean(abs(b))/beta), by=.(beta)]
knitr::kable(table2, digits=c(2, 2, 2, 1))
```

beta | mean_conditional | mean_abs.conditional | multiplier |
---|---|---|---|

0.05 | 0.29 | 0.73 | 14.5 |

0.15 | 0.59 | 0.74 | 4.9 |

0.50 | 0.83 | 0.83 | 1.7 |

1.00 | 1.08 | 1.08 | 1.1 |

## 5.2 filter = 0.2

```
table3 <- res_table[pval < 0.2, .(mean_conditional=mean(b),
mean_abs.conditional=mean(abs(b)),
multiplier = mean(abs(b))/beta), by=.(beta)]
knitr::kable(table3, digits=c(2, 2, 2, 1))
```

beta | mean_conditional | mean_abs.conditional | multiplier |
---|---|---|---|

0.05 | 0.15 | 0.55 | 11.1 |

0.15 | 0.42 | 0.57 | 3.8 |

0.50 | 0.69 | 0.69 | 1.4 |

1.00 | 1.03 | 1.03 | 1.0 |

for example, in an experiment, if we compare the mean response between a control group and a treated group, the difference in means is the effect. More generally, an effect is the coefficient of a linear model↩