A factorial experiment is one in which there are two or more factor variables (categorical \(X\)) that are crossed, resulting in a group for each combination of the levels of each factor. Factorial experiments are used to estimate the interaction effect between factors. Two factors interact when the effect of one factor depends on the level of the other factors. Interactions are ubiquitous, although sometimes they are small enough to ignore with little to no loss of understanding.
motivating source: Integration of two herbivore-induced plant volatiles results in synergistic effects on plant defense and resistance
What is synergism or antagonism? (this post is a follow up to What is an interaction?)
In the experiment for Figure 1 of the motivating source article, the researchers were explicitly interested in measuring any synergistic effects of hac and indole on the response. What is a synergistic effect? If hac and indole act independently, then the response should be additive – the HAC+Indole effect should simply be the sum of the independent HAC and Indole effects.
In R, so-called “Type I sums of squares” are default. With balanced designs, inferential statistics from Type I, II, and III sums of squares are equal. Type III sums of squares are returned using car::Anova instead of base R anova. But to get the correct Type III statistics, you cannot simply specify car:Anova(m1, type = 3). You also have to set the contrasts in the model matrix to contr.sum in your linear model fit.
Warning - This is a long, exploratory post on Q-Q plots motivated by the specific data set analyzed below and the code follows my stream of thinking this through. I have not gone back through to economize length. So yeh, some repeated code I’ve turned into functions and other repeated code is repeated.
This post is not about how to interpret a Q-Q plot but about which Q-Q plot? to interpret.
This is fake data that simulates an experiment to measure effect of treatment on fat weight in mice. The treatment is “diet” with two levels: “control” (blue dots) and “treated” (gold dots). Diet has a large effect on total body weight. The simulated data are in the plot above - these look very much like the real data.
The question is, what are problems with using an “ancova” linear model to estimate the direct effect of treatment on fat weight?
TL;DR A sample table from the full results for data that look like this
Table 1: Coverage of 95% bca CIs. parameter n=5 n=10 n=20 n=40 n=80 means Control 81.4 87.6 92.2 93.0 93.6 b4GalT1-/- 81.3 90.2 90.8 93.0 93.8 difference in means diff 83.