The motivation for this post was to create a pipeline for generating publication-ready plots entirely within ggplot and avoid post-generation touch-ups in Illustrator or Inkscape. These scripts are a start. The ideal modification would be turning the chunks into functions with personalized detail so that a research team could quickly and efficiently generate multiple plots. I might try to turn the scripts into a very-general-but-not-ready-for-r-package function for my students.
Continue to the whole post
A factorial experiment is one in which there are two or more factor variables (categorical \(X\)) that are crossed, resulting in a group for each combination of the levels of each factor. Factorial experiments are used to estimate the interaction effect between factors. Two factors interact when the effect of one factor depends on the level of the other factors. Interactions are ubiquitous, although sometimes they are small enough to ignore with little to no loss of understanding.
motivating source: Integration of two herbivore-induced plant volatiles results in synergistic effects on plant defense and resistance
What is synergism or antagonism? (this post is a follow up to What is an interaction?)
In the experiment for Figure 1 of the motivating source article, the researchers were explicitly interested in measuring any synergistic effects of hac and indole on the response. What is a synergistic effect? If hac and indole act independently, then the response should be additive – the HAC+Indole effect should simply be the sum of the independent HAC and Indole effects.
In R, so-called “Type I sums of squares” are default. With balanced designs, inferential statistics from Type I, II, and III sums of squares are equal. Type III sums of squares are returned using car::Anova instead of base R anova. But to get the correct Type III statistics, you cannot simply specify car:Anova(m1, type = 3). You also have to set the contrasts in the model matrix to contr.sum in your linear model fit.
Warning - This is a long, exploratory post on Q-Q plots motivated by the specific data set analyzed below and the code follows my stream of thinking this through. I have not gone back through to economize length. So yeh, some repeated code I’ve turned into functions and other repeated code is repeated.
This post is not about how to interpret a Q-Q plot but about which Q-Q plot? to interpret.
This is fake data that simulates an experiment to measure effect of treatment on fat weight in mice. The treatment is “diet” with two levels: “control” (blue dots) and “treated” (gold dots). Diet has a large effect on total body weight. The simulated data are in the plot above - these look very much like the real data.
The question is, what are problems with using an “ancova” linear model to estimate the direct effect of treatment on fat weight?
TL;DR A sample table from the full results for data that look like this
Table 1: Coverage of 95% bca CIs. parameter n=5 n=10 n=20 n=40 n=80 means Control 81.4 87.6 92.2 93.0 93.6 b4GalT1-/- 81.3 90.2 90.8 93.0 93.8 difference in means diff 83.
Motivator: Novel metabolic role for BDNF in pancreatic β-cell insulin secretion
I’ll finish this some day…
knitr::opts_chunk$set(echo = TRUE, message=FALSE) library(tidyverse) library(data.table) library(mvtnorm) library(lmerTest) normal response niter <- 2000 n <- 9 treatment_levels <- c("cn", "high", "high_bdnf") insulin <- data.table(treatment = rep(treatment_levels, each=n)) X <- model.matrix(~ treatment, data=insulin) beta <- c(0,0,0) # no effects # the three responses are taken from the same cluster of cells and so have expected # correlation rho.
An answer to this tweet “Are there any #Rstats tidy expeRts who’d be interested in improving the efficiency of this code that gathers multiple variables from wide to long?
This works but it’s not pretty. There must be a prettier way…"
Wide data frame has three time points where participants answer two questions on two topics.
create data from original code #Simmed data Time1.Topic1.Question1 <- rnorm(500) data <- data.frame(Time1.Topic1.Question1) data$Time1.TOpic1.Question2 <- rnorm(500) data$Time1.