# What is the consequence of a Shapiro-Wilk test-of-normality filter on Type I error and Power?

Set up Normal distribution Type I error Power Right skewed continuous – lognormal What the parameterizations look like Type I error Power This 1990-wants-you-back doodle explores the effects of a Normality Filter – using a Shapiro-Wilk (SW) test as a decision rule for using either a t-test or some alternative such as a 1) non-parametric Mann-Whitney-Wilcoxon (MWW) test, or 2) a t-test on the log-transformed response.

# Is the power to test an interaction effect less than that for a main effect?

I was googling around and somehow landed on a page that stated “When effect coding is used, statistical power is the same for all regression coefficients of the same size, whether they correspond to main effects or interactions, and irrespective of the order of the interaction”. Really? How could this be? The p-value for an interaction effect is the same regardless of dummy or effects coding, and, with dummy coding (R’s default), the power of the interaction effect is less than that of the coefficients for the main factors when they have the same magnitude, so my intuition said this statement must be wrong.

# What does cell biology data look like?

If I’m going to evaluate the widespread use of t-tests/ANOVAs on count data in bench biology then I’d like to know what these data look like, specifically the shape (“overdispersion”) parameter. Set up library(ggplot2) library(readxl) library(ggpubr) library(cowplot) library(plyr) #mapvalues library(data.table) # glm packages library(MASS) library(pscl) #zeroinfl library(DHARMa) library(mvabund) data_path <- "../data" # notebook, console source("../../../R/clean_labels.R") # notebook, console Data from The enteric nervous system promotes intestinal health by constraining microbiota composition Import read_enteric <- function(sheet_i, range_i, file_path, wide_2_long=TRUE){ dt_wide <- data.

# GLM vs. t-tests vs. non-parametric tests if all we care about is NHST -- Update

Update to the earlier post, which was written in response to my own thinking about how to teach stastics to experimental biologists working in fields that are dominated by hypothesis testing instead of estimation. That is, should these researchers learn GLMs or is a t-test on raw or log-transformed data on something like count data good enough – or even superior? My post was written without the benefit of either [Ives](Ives, Anthony R.

# Should we be skeptical of a "large" effect size if p > 0.05?

Motivator: A twitter comment “Isn’t the implication that the large effect size is a direct byproduct of the lack of power? i.e. that if the the study had more power, the effect size would have been found to be smaller.”1 2 A thought: our belief in the magnitude of an observed effect should be based on our priors, which, hopefully, are formed from good mechanistic models and not sample size“.3

# GLM vs. t-tests vs. non-parametric tests if all we care about is NHST

This post has been updated. A skeleton simulation of different strategies for NHST for count data if all we care about is a p-value, as in bench biology where p-values are used to simply give one confidence that something didn’t go terribly wrong (similar to doing experiments in triplicate – it’s not the effect size that matters only “we have experimental evidence of a replicable effect”). tl;dr - At least for Type I error at small $$n$$, log(response) and Wilcoxan have the best performance over the simulation space.

# On alpha

This post is motivated by Terry McGlynn’s thought provoking How do we move beyond an arbitrary statistical threshold? I have been struggling with the ideas explored in Terry’s post ever since starting my PhD 30 years ago, and its only been in the last couple of years that my own thoughts have begun to gel. This long marination period is largely because of my very classical biostatistical training. My PhD is from the Department of Anatomical Sciences at Stony Brook but the content was geometric morphometrics and James Rohlf was my mentor for morphometrics specifically, and multivariate statistics more generally.

#### R doodles. Some ecology. Some physiology. Much fake data.

Thoughts on R, statistical best practices, and teaching applied statistics to Biology majors.

Jeff Walker, Professor of Biological Sciences

University of Southern Maine, Portland, Maine, United States