# The statistical significance filter

1 Why reported effect sizes are inflated 2 Setup 3 Exploration 1 4 Unconditional means, power, and sign error 5 Conditional means 5.1 filter = 0.05 5.2 filter = 0.2 1 Why reported effect sizes are inflated This post is motivated by many discussions in Gelman’s blog but start here When we estimate an effect1, the estimate will be a little inflated or a little diminished relative to the true effect but the expectation of the effect is the true effect.

# Covariate adjustment in randomized experiments

The post motivated by a tweetorial from Darren Dahly In an experiment, do we adjust for covariates that differ between treatment levels measured pre-experiment (“imbalance” in random assignment), where a difference is inferred from a t-test with p < 0.05? Or do we adjust for all covariates, regardless of differences pre-test? Or do we adjust only for covariates that have sustantial correlation with the outcome? Or do we not adjust at all?

# What to write, and not write, in a results section — an ever-growing list

“GPP (n=4 per site) increased from the No Wildlife site to the Hippo site but was lowest at the Hippo + WB site (Fig. 6); however, these differences were not significant due to low sample sizes and high variability.” – Subalusky, A.L., Dutton, C.L., Njoroge, L., Rosi, E.J., and Post, D.M. (2018). Organic matter and nutrient inputs from large wildlife influence ecosystem function in the Mara River, Africa. Ecology 99, 2558–2574.

# Paired line plots

load libraries make some fake data make a plot with ggplot ggplot scripts to draw figures like those in the Dynamic Ecology post Paired line plots (a.k.a. “reaction norms”) to visualize Likert data load libraries library(ggplot2) library(ggpubr) library(data.table) make some fake data set.seed(3) n <- 40 self <- rbinom(n, 5, 0.25) + 1 others <- self + rbinom(n, 3, 0.5) fd <- data.table(id=factor(rep(1:n, 2)), who=factor(rep(c("self", "others"), each=n)), stigma <- c(self, others)) make a plot with ggplot The students are identified by the column “id”.

# GLM vs. t-tests vs. non-parametric tests if all we care about is NHST

A skeleton simulation of different strategies for NHST for count data if all we care about is a p-value, as in bench biology where p-values are used to simply give one confidence that something didn’t go terribly wrong (similar to doing experiments in triplicate – it’s not the effect size that matters only “we have experimental evidence of a replicatable effect”) load libraries library(ggplot2) library(MASS) library(data.table) do_sim <- function(){ set.seed(1) niter <- 1000 methods <- c("t", "Welch", "log", "Wilcoxan", "nb") p_table <- matrix(NA, nrow=niter, ncol=length(methods)) colnames(p_table) <- methods res_table <- data.

# Expected covariances in a causal network

This is a skeleton post Standardized variables (Wright’s rules) n <- 10^5 # z is the common cause of g1 and g2 z <- rnorm(n) # effects of z on g1 and g2 b1 <- 0.7 b2 <- 0.7 r12 <- b1*b2 g1 <- b1*z + sqrt(1-b1^2)*rnorm(n) g2 <- b2*z + sqrt(1-b2^2)*rnorm(n) var(g1) # E(VAR(g1)) = 1 ## [1] 0.997149 var(g2) # E(VAR(g2)) = 1 ## [1] 0.9972956 cor(g1, g2) # E(COR(g1,g2)) = b1*b2 ## [1] 0.

# Compute a random data matrix (fake data) without rmvnorm

This is a skeleton post until I have time to flesh it out. The post is motivated by a question on twitter about creating fake data that has a covariance matrix that simulates a known (given) covariance matrix that has one or more negative (or zero) eigenvalues. First, some libraries library(data.table) library(mvtnorm) library(MASS) Second, some functions… random.sign <- function(u){ # this is fastest of three out <- sign(runif(u)-0.5) #randomly draws from {-1,1} with probability of each = 0.
• page 1 of 3

#### R doodles. Some ecology. Some physiology. Much fake data.

Thoughts on R, statistical best practices, and teaching applied statistics to Biology majors

Jeff Walker, Professor of Biological Sciences

University of Southern Maine, Portland, Maine, United States