# What is the consequence of a Shapiro-Wilk test-of-normality filter on Type I error and Power?

Set up Normal distribution Type I error Power Right skewed continuous – lognormal What the parameterizations look like Type I error Power This 1990-wants-you-back doodle explores the effects of a Normality Filter – using a Shapiro-Wilk (SW) test as a decision rule for using either a t-test or some alternative such as a 1) non-parametric Mann-Whitney-Wilcoxon (MWW) test, or 2) a t-test on the log-transformed response.

# What is the bias in the estimation of an effect given an omitted interaction term?

Some background (due to Sewall Wright’s method of path analysis) Given a generating model: $$$y = \beta_0 + \beta_1 x_1 + \beta_2 x_2 + \beta_3 x_3$$$ where $$x_3 = x_1 x_2$$; that is, it is an interaction variable. The total effect of $$x_1$$ on $$y$$ is $$\beta_1 + \frac{\mathrm{COV}(x_1, x_2)}{\mathrm{VAR}(x_1)} \beta_2 + \frac{\mathrm{COV}(x_1, x_3)}{\mathrm{VAR}(x_1)} \beta_3$$. If $$x_3$$ (the interaction) is missing, its component on the total efffect is added to the coefficient of $$x_1$$.

# GLM vs. t-tests vs. non-parametric tests if all we care about is NHST -- Update

Update to the earlier post, which was written in response to my own thinking about how to teach stastics to experimental biologists working in fields that are dominated by hypothesis testing instead of estimation. That is, should these researchers learn GLMs or is a t-test on raw or log-transformed data on something like count data good enough – or even superior? My post was written without the benefit of either [Ives](Ives, Anthony R.

# The statistical significance filter

1 Why reported effect sizes are inflated 2 Setup 3 Exploration 1 4 Unconditional means, power, and sign error 5 Conditional means 5.1 filter = 0.05 5.2 filter = 0.2 1 Why reported effect sizes are inflated This post is motivated by many discussions in Gelman’s blog but start here When we estimate an effect1, the estimate will be a little inflated or a little diminished relative to the true effect but the expectation of the effect is the true effect.

# Paired line plots

load libraries make some fake data make a plot with ggplot ggplot scripts to draw figures like those in the Dynamic Ecology post Paired line plots (a.k.a. “reaction norms”) to visualize Likert data load libraries library(ggplot2) library(ggpubr) library(data.table) make some fake data set.seed(3) n <- 40 self <- rbinom(n, 5, 0.25) + 1 others <- self + rbinom(n, 3, 0.5) fd <- data.table(id=factor(rep(1:n, 2)), who=factor(rep(c("self", "others"), each=n)), stigma <- c(self, others)) make a plot with ggplot The students are identified by the column “id”.

# A simple ggplot of some measure against depth

set up The goal is to plot the measure of something, say O2 levels, against depth (soil or lake), with the measures taken on multiple days library(ggplot2) library(data.table) First – create fake data depths <- c(0, seq(10,100, by=10)) dates <- c("Jan-18", "Mar-18", "May-18", "Jul-18") x <- expand.grid(date=dates, depth=depths) n <- nrow(x) head(x) ## date depth ## 1 Jan-18 0 ## 2 Mar-18 0 ## 3 May-18 0 ## 4 Jul-18 0 ## 5 Jan-18 10 ## 6 Mar-18 10 X <- model.

#### R doodles. Some ecology. Some physiology. Much fake data.

Thoughts on R, statistical best practices, and teaching applied statistics to Biology majors.

Jeff Walker, Professor of Biological Sciences

University of Southern Maine, Portland, Maine, United States