Covariate adjustment in randomized experiments

April 12, 2019 in stats 101

The post motivated by a tweetorial from Darren Dahly In an experiment, do we adjust for covariates that differ between treatment levels measured pre-experiment (“imbalance” in random assignment), where a difference is inferred from a t-test with p < 0.05? Or do we adjust for all covariates, regardless of differences pre-test? Or do we adjust only for covariates that have sustantial correlation with the outcome? Or do we not adjust at all?

What to write, and not write, in a results section — an ever-growing list

January 31, 2019 in stats 101

“GPP (n=4 per site) increased from the No Wildlife site to the Hippo site but was lowest at the Hippo + WB site (Fig. 6); however, these differences were not significant due to low sample sizes and high variability.” If we know these are not significant due to low sampe size and high variability, why even do the test? “TRE led to a modest, but not significant, increase in sleep duration to 449.

Paired line plots

January 22, 2019 in ggplot

load libraries make some fake data make a plot with ggplot ggplot scripts to draw figures like those in the Dynamic Ecology post Paired line plots (a.k.a. “reaction norms”) to visualize Likert data load libraries library(ggplot2) library(ggpubr) library(data.table) make some fake data set.seed(3) n <- 40 self <- rbinom(n, 5, 0.25) + 1 others <- self + rbinom(n, 3, 0.5) fd <- data.table(id=factor(rep(1:n, 2)), who=factor(rep(c("self", "others"), each=n)), stigma <- c(self, others)) make a plot with ggplot The students are identified by the column “id”.

GLM vs. t-tests vs. non-parametric tests if all we care about is NHST

January 7, 2019 in stats 101

This post has been updated. A skeleton simulation of different strategies for NHST for count data if all we care about is a p-value, as in bench biology where p-values are used to simply give one confidence that something didn’t go terribly wrong (similar to doing experiments in triplicate – it’s not the effect size that matters only “we have experimental evidence of a replicable effect”). tl;dr - At least for Type I error at small \(n\), log(response) and Wilcoxan have the best performance over the simulation space.

Expected covariances in a causal network

January 3, 2019

This is a skeleton post Standardized variables (Wright’s rules) n <- 10^5 # z is the common cause of g1 and g2 z <- rnorm(n) # effects of z on g1 and g2 b1 <- 0.7 b2 <- 0.7 r12 <- b1*b2 g1 <- b1*z + sqrt(1-b1^2)*rnorm(n) g2 <- b2*z + sqrt(1-b2^2)*rnorm(n) var(g1) # E(VAR(g1)) = 1 ## [1] 1.001849 var(g2) # E(VAR(g2)) = 1 ## [1] 1.006102 cor(g1, g2) # E(COR(g1,g2)) = b1*b2 ## [1] 0.

Compute a random data matrix (fake data) without rmvnorm

December 20, 2018

This is a skeleton post until I have time to flesh it out. The post is motivated by a question on twitter about creating fake data that has a covariance matrix that simulates a known (given) covariance matrix that has one or more negative (or zero) eigenvalues. First, some libraries library(data.table) library(mvtnorm) library(MASS) Second, some functions… random.sign <- function(u){ # this is fastest of three out <- sign(runif(u)-0.5) #randomly draws from {-1,1} with probability of each = 0.

Reporting effects as relative differences...with a confidence interval

November 14, 2018 in stats 101

Researchers frequently report results as relative effects, for example, “Male flies from selected lines had 50% larger upwind flight ability than male flies from control lines (Control mean: 117.5 cm/s; Selected mean 176.5 cm/s).” where a relative effect is \[\begin{equation} 100 \frac{\bar{y}_B - \bar{y}_A}{\bar{y}_A} \end{equation}\] If we are to follow best practices, we should present this effect with a measure of uncertainty, such as a confidence interval. The absolute effect is 59.

NEWER POSTS
OLDER POSTS
page 5 of 7

Covariate adjustment in randomized experiments

What to write, and not write, in a results section — an ever-growing list

Paired line plots

GLM vs. t-tests vs. non-parametric tests if all we care about is NHST

Expected covariances in a causal network

Compute a random data matrix (fake data) without rmvnorm

Reporting effects as relative differences...with a confidence interval

R doodles. Some ecology. Some physiology. Much fake data.

How to make plots with factor levels below the x-axis (bench-biology style)

What is an interaction?

How to estimate synergism or antagonism

Type 3 ANOVA in R -- an easy way to publish wrong tables

Linear models with a covariate ("ANCOVA")

Normal Q-Q plots - what is the robust line and should we prefer it?

ANCOVA when the covariate is a mediator affected by treatment

Bootstrap confidence intervals when sample size is really small

What is the consequence of normalizing by each case in the control?

Melting a list of columns