Bootstrap confidence intervals when sample size is really small

June 1, 2020 in stats 101

TL;DR A sample table from the full results for data that look like this Table 1: Coverage of 95% bca CIs. parameter n=5 n=10 n=20 n=40 n=80 means Control 81.4 87.6 92.2 93.0 93.6 b4GalT1-/- 81.3 90.2 90.8 93.0 93.8 difference in means diff 83.

"Nested" random factors in mixed (multilevel or hierarchical) models

November 30, 2019 in stats 101

Setup Import Models as nested using “tank” nested within “room” as two random intercepts (using lme4 to create the combinations) A safer (lme4) way to create the combinations of “room” and “tank”: as two random intercepts using “tank2” Don’t do this This is a skeletal post to show the equivalency of different ways of thinking about “nested” factors in a mixed model. The data are measures of life history traits in lice that infect salmon.

What does cell biology data look like?

June 9, 2019 in stats 101

If I’m going to evaluate the widespread use of t-tests/ANOVAs on count data in bench biology then I’d like to know what these data look like, specifically the shape (“overdispersion”) parameter. Set up library(ggplot2) library(readxl) library(ggpubr) library(cowplot) library(plyr) #mapvalues library(data.table) # glm packages library(MASS) library(pscl) #zeroinfl library(DHARMa) library(mvabund) data_path <- "../data" # notebook, console source("../../../R/clean_labels.R") # notebook, console Data from The enteric nervous system promotes intestinal health by constraining microbiota composition Import read_enteric <- function(sheet_i, range_i, file_path, wide_2_long=TRUE){ dt_wide <- data.

GLM vs. t-tests vs. non-parametric tests if all we care about is NHST -- Update

May 30, 2019 in stats 101

Update to the earlier post, which was written in response to my own thinking about how to teach stastics to experimental biologists working in fields that are dominated by hypothesis testing instead of estimation. That is, should these researchers learn GLMs or is a t-test on raw or log-transformed data on something like count data good enough – or even superior? My post was written without the benefit of either [Ives](Ives, Anthony R.

GLM vs. t-tests vs. non-parametric tests if all we care about is NHST

January 7, 2019 in stats 101

This post has been updated. A skeleton simulation of different strategies for NHST for count data if all we care about is a p-value, as in bench biology where p-values are used to simply give one confidence that something didn’t go terribly wrong (similar to doing experiments in triplicate – it’s not the effect size that matters only “we have experimental evidence of a replicable effect”). tl;dr - At least for Type I error at small \(n\), log(response) and Wilcoxan have the best performance over the simulation space.

Should the model-averaged prediction be computed on the link or response scale in a GLM?

May 13, 2018

[updated to include additional output from MuMIn, BMA, and BAS] This post is a follow up to my inital post, which was written as as a way for me to pen my mental thoughts on the recent review of “Model averaging in ecology: a review of Bayesian, information‐theoretic and tactical approaches for predictive inference”. It was also written without contacting and discussing the issue with the authors. This post benefits from a series of e-mails with the lead author Carsten Dormann and the last author Florian Hartig.

Bootstrap confidence intervals when sample size is really small

"Nested" random factors in mixed (multilevel or hierarchical) models

What does cell biology data look like?

GLM vs. t-tests vs. non-parametric tests if all we care about is NHST -- Update

GLM vs. t-tests vs. non-parametric tests if all we care about is NHST

Should the model-averaged prediction be computed on the link or response scale in a GLM?

R doodles. Some ecology. Some physiology. Much fake data.

How to make plots with factor levels below the x-axis (bench-biology style)

What is an interaction?

How to estimate synergism or antagonism

Type 3 ANOVA in R -- an easy way to publish wrong tables

Linear models with a covariate ("ANCOVA")

Normal Q-Q plots - what is the robust line and should we prefer it?

ANCOVA when the covariate is a mediator affected by treatment

Bootstrap confidence intervals when sample size is really small

What is the consequence of normalizing by each case in the control?

Melting a list of columns