What is the consequence of a Shapiro-Wilk test-of-normality filter on Type I error and Power?

August 8, 2019 in stats 101

Set up Normal distribution Type I error Power Right skewed continuous – lognormal What the parameterizations look like Type I error Power This 1990-wants-you-back doodle explores the effects of a Normality Filter – using a Shapiro-Wilk (SW) test as a decision rule for using either a t-test or some alternative such as a 1) non-parametric Mann-Whitney-Wilcoxon (MWW) test, or 2) a t-test on the log-transformed response.

What is the bias in the estimation of an effect given an omitted interaction term?

July 31, 2019 in stats 101

Some background (due to Sewall Wright’s method of path analysis) Given a generating model: \[\begin{equation} y = \beta_0 + \beta_1 x_1 + \beta_2 x_2 + \beta_3 x_3 \end{equation}\] where \(x_3 = x_1 x_2\); that is, it is an interaction variable. The total effect of \(x_1\) on \(y\) is \(\beta_1 + \frac{\mathrm{COV}(x_1, x_2)}{\mathrm{VAR}(x_1)} \beta_2 + \frac{\mathrm{COV}(x_1, x_3)}{\mathrm{VAR}(x_1)} \beta_3\). If \(x_3\) (the interaction) is missing, its component on the total efffect is added to the coefficient of \(x_1\).

Is the power to test an interaction effect less than that for a main effect?

July 2, 2019 in stats 101

I was googling around and somehow landed on a page that stated “When effect coding is used, statistical power is the same for all regression coefficients of the same size, whether they correspond to main effects or interactions, and irrespective of the order of the interaction”. Really? How could this be? The p-value for an interaction effect is the same regardless of dummy or effects coding, and, with dummy coding (R’s default), the power of the interaction effect is less than that of the coefficients for the main factors when they have the same magnitude, so my intuition said this statement must be wrong.

Analyze the mean (or median) and not the max response

June 25, 2019 in stats 101

This is an update of Paired t-test as a special case of linear model and hierarchical model Figure 2A of the paper Meta-omics analysis of elite athletes identifies a performance-enhancing microbe that functions via lactate metabolism uses a paired t-test to compare endurance performance in mice treated with a control microbe (Lactobacillus bulgaricus) and a test microbe (Veillonella atypica) in a cross-over design (so each mouse was treated with both bacteria).

Paired t-test as a special case of linear model and hierarchical (linear mixed) model

June 25, 2019 in stats 101

Update – Fig. 2A is an analysis of the maximum endurance over three trials. This has consequences. Figure 2A of the paper Meta-omics analysis of elite athletes identifies a performance-enhancing microbe that functions via lactate metabolism uses a paired t-test to compare endurance performance in mice treated with a control microbe (Lactobacillus bulgaricus) and a test microbe (Veillonella atypica) in a cross-over design (so each mouse was treated with both bacteria).

What does cell biology data look like?

June 9, 2019 in stats 101

If I’m going to evaluate the widespread use of t-tests/ANOVAs on count data in bench biology then I’d like to know what these data look like, specifically the shape (“overdispersion”) parameter. Set up library(ggplot2) library(readxl) library(ggpubr) library(cowplot) library(plyr) #mapvalues library(data.table) # glm packages library(MASS) library(pscl) #zeroinfl library(DHARMa) library(mvabund) data_path <- "../data" # notebook, console source("../../../R/clean_labels.R") # notebook, console Data from The enteric nervous system promotes intestinal health by constraining microbiota composition Import read_enteric <- function(sheet_i, range_i, file_path, wide_2_long=TRUE){ dt_wide <- data.

GLM vs. t-tests vs. non-parametric tests if all we care about is NHST -- Update

May 30, 2019 in stats 101

Update to the earlier post, which was written in response to my own thinking about how to teach stastics to experimental biologists working in fields that are dominated by hypothesis testing instead of estimation. That is, should these researchers learn GLMs or is a t-test on raw or log-transformed data on something like count data good enough – or even superior? My post was written without the benefit of either [Ives](Ives, Anthony R.

NEWER POSTS
OLDER POSTS
page 3 of 5

What is the consequence of a Shapiro-Wilk test-of-normality filter on Type I error and Power?

What is the bias in the estimation of an effect given an omitted interaction term?

Is the power to test an interaction effect less than that for a main effect?

Analyze the mean (or median) and not the max response

Paired t-test as a special case of linear model and hierarchical (linear mixed) model

What does cell biology data look like?

GLM vs. t-tests vs. non-parametric tests if all we care about is NHST -- Update

R doodles. Some ecology. Some physiology. Much fake data.

How to make plots with factor levels below the x-axis (bench-biology style)

What is an interaction?

How to estimate synergism or antagonism

Type 3 ANOVA in R -- an easy way to publish wrong tables

Linear models with a covariate ("ANCOVA")

Normal Q-Q plots - what is the robust line and should we prefer it?

ANCOVA when the covariate is a mediator affected by treatment

Bootstrap confidence intervals when sample size is really small

What is the consequence of normalizing by each case in the control?

Melting a list of columns