Copyright © Jeffrey A. Walker
motivating source: Integration of two herbivore-induced plant volatiles results in synergistic effects on plant defense and resistance
tags: factorial design, interaction, synergism, two-way anova, linear model
(this post is a follow up to What is an interaction?)
In the experiment for Figure 1 of the motivating source article, the authors compared the defense response to exposure to four different combinations of volatile compounds. These four combinations are:
The design is a fully crossed, \(2 \times 2\) factorial experiment – there are two factors (categorical \(X\) variables) each with two levels.
hac
with levels “HAC-” and “HAC+”indole
with levels “Indole-” and “Indole+”The researchers were explicitly interested in measuring any synergistic effects of hac
and indole
on the response. What is a synergistic effect? If hac
and indole
act independently, then the response should be additive – the HAC+Indole effect should simply be the sum of the independent HAC and Indole effects. A HAC+Indole effect that is larger than the sum of the two independent effects is evidence of a synergistic (positive) interaction. A HAC+Indole effect that is less than the sum of the two independent effects is evidence of an antagonistic (negative) interaction.
Synergism or antagonism are very easy to estimate given a fully factorial design – these are simply the interaction coefficient from the linear model using dummy coding for the indicator variables. The researchers didn’t use this coefficient to infer synergism but instead computed the effect manually and then invented a test to compute a p-value. I see something like this pretty regularly in the experimental biology literature, which makes me think that many biology researchers are not familiar with factorial linear models to estimate interaction effects and what an interaction effect is. They are probably not familiar with factorial linear models and the meaning of the coefficients because this isn’t taught. Instead, 2-way ANOVA is.
So, let’s fit a factorial linear model and show that the coefficient of the interaction is the measure of synergism (or antagonism).
knitr::opts_chunk$set(echo = TRUE, message=FALSE, warning=FALSE)
library(here)
library(janitor)
library(readxl)
library(data.table)
# graphics and tables
library(ggplot2)
library(kableExtra)
# analysis packages
library(lmPerm)
here <- here::here
data_folder <- "content/data"
data_from <- "Integration of two herbivore-induced plant volatiles results in synergistic effects on plant defense and resistance"
file_name <- "Data for Dryad.xlsx"
file_path <- here(data_folder, data_from, file_name)
fig1 <- read_excel(file_path,
sheet = "Fig. 1",
range = "A3:K23", # 1 blank column
col_names = TRUE) %>%
data.table() %>%
clean_names()
# create proper treatment variables for each factor
fig1[treat == "Control", c("hac", "indole") := list("HAC-", "Indole-")]
fig1[treat == "HAC", c("hac", "indole") := list("HAC+", "Indole-")]
fig1[treat == "Indole", c("hac", "indole") := list("HAC-", "Indole+")]
fig1[treat == "HAC+Indole", c("hac", "indole") := list("HAC+", "Indole+")]
treat_levels <- c("Control", "HAC", "Indole", "HAC+Indole")
fig1[, treat := factor(treat, levels = treat_levels)]
fig1[, hac := factor(hac)] # levels in correct order
fig1[, indole := factor(indole)] # levels in correct order
# View(fig1)
The archived Excel file contains only a single factor variable (treat
) with the \(2 \times 2\) experimental treatment combinations flattened into a single \(4 \times 1\) grouping variable. The script above creates the two experimental treatment factors hac
and indole
.
I’ll use the data in Figure 1a of the source article as the example. This figure shows the tissue concentration of the defense hormone abcisic acid (aba) in response to the four treatment combinations.
m1 <- lm(aba ~ hac*indole, data = fig1)
m1_coef <- cbind(coef(summary(m1)),
confint(m1))
m1_table <- knitr::kable(m1_coef,
digits = c(1,2,2,3,1,1),
caption = "Coefficient table")
m1_table %>%
kable_styling(bootstrap_options = "striped",
full_width = FALSE,
position = "center")
Estimate | Std. Error | t value | Pr(>|t|) | 2.5 % | 97.5 % | |
---|---|---|---|---|---|---|
(Intercept) | 19.2 | 3.23 | 5.94 | 0.000 | 12.3 | 26.0 |
hacHAC+ | 13.3 | 4.56 | 2.91 | 0.010 | 3.6 | 23.0 |
indoleIndole+ | 8.8 | 4.56 | 1.92 | 0.072 | -0.9 | 18.5 |
hacHAC+:indoleIndole+ | 12.8 | 6.45 | 1.99 | 0.064 | -0.9 | 26.5 |
The interaction effect on ABA is 12.8 (95% CI: -0.9, 26.5) ng per g FW. I’m not a plant defense expert so I don’t know what is a big vs. small effect. Regardless, a synergistic effect as big as the single factor effects (HAC and Indole) is consistent with the data. An additive effect is less consistent. A negative (antagonistic) effect, at least one of any reasonable magnitude, is much less consistent with the data.
The p-value for the interaction effect is 0.064. The researchers compare p-values to 0.05 throughout. If we are making this comparison to control Type I error rates (and why else would be compare to a bright line?) then the actual p-value doesn’t really matter. There is no “marginally signficant” or “trending toward significance”. We simply accept that we failed to find evidence against the null (and additive model) and move on.
The researchers used the FDR to adjust p-values. First, for each response, did the researchers include their test for synergism in the family? I suspect not, but also, how would one even properly include it given that the t-value for this test is not independent of the post-hoc tests among groups. Second, what is the “family” here? There are multiple (nine) responses for the same experiment – I would include all of these in the same family (were I to use a multiple testing adjustment, which I probably wouldn’t). This is 6 post-hoc tests plus 1 synergism test times 9 responses, or 63 tests.
Potential synergism was evaluated using a previously described approach (Machado et al., 2018). Briefly, we calculated additive effects by randomly pairing replicates of individual volatile treatments (an indole treated plant [\(I_n\)] and a HAC treated plant [\(H_n\)]). For each random pair, we calculated theoretical additive values (\(A_n\)) for the different defence parameters using the following formula: \(A_n = I_n + H_n − C_{av}\), where \(C_{av}\) corresponds to the average level of nonexposed control plants. The calculated additive values were then compared with the measured treatment values of the double volatile treatment using Student’s t tests. Cases in which the measured level of the double volatile treatment was significantly greater than the calculated additive level were classified as synergistic.
I only saw less, not more, detail in Machodo et al. 2018.
The researchers method of computing the additive and synergistic effects is simply the underlying math of the interaction effect if the indicator variables are dummy coded. Using the researchers notation, the expected additive response is \(C_{av} + (H_{av} - C_{av}) + (I_{av} - C_{av})\). This reduces to the researcher’s equation if we substitute a sampled pair with \(H_{av}\) and \(I_{av}\) but it is also equal to \(b_0 + b_1 + b_2\), since
(Intercept)
in the table)hac
(hacHAC+
in the table).indole
(indoleIndole+
in the table)The interaction effect (the coefficient \(b_3\)) is
hacHAC+:indoleIndole+
in the table)where \(HI_n\) is the mean of the “HAC+Indole” group. This simply shows that the coefficient \(b_3\) is the “synergistic effect” computed by the researchers. And, there is no need to do this computation since it is computed for us if we use dummy (treatment) coding, which is the default contrast coding in R but not most other software.
The researchers in this paper aren’t the first to compute the synergistic (interaction) effect like this. As I said above, I see something like this pretty regularly in the experimental biology literature, which makes me think that many biology researchers are not familiar with factorial linear models to estimate interaction effects and what an interaction effect is. They are probably not familiar with factorial linear models and the meaning of the coefficients because this isn’t taught. Instead, 2-way ANOVA is.
This is my understanding of the test invented by the researchers
hi_n <- fig1[treat == "HAC+Indole", aba]
c_av <- mean(fig1[treat == "Control", aba])
c_n <- sample(fig1[treat == "Control", aba])
h_n <- sample(fig1[treat == "HAC", aba])
i_n <- sample(fig1[treat == "Indole", aba])
a_n <- h_n + i_n - c_av # the author's test
a_n_2 <- h_n + i_n - c_n # why not also sample control?
# the "synergism test"
t.test(a_n, hi_n, var.equal = TRUE)$p.value
## [1] 0.1228597
# using the sampled control
t.test(a_n_2, hi_n, var.equal = TRUE)$p.value
## [1] 0.127285
This is highly dependent on the random sample. The mean of \(a_n\) is the same regardless of the random pairing, but the sd differs with each random pairing. This means the t and p-value are dependent on which pairs are sampled. all resampling methods have a stochastic component but not in this sense. If I understand the method, there is no way to minimize the size of the stochastic component as in something like the bootstrap (where the stochastic component is decreaesed by resampling more).
Let’s make a function out of this and look at the variation in the p-value
n_samples <- 100
hi_n <- fig1[treat == "HAC+Indole", aba]
c_av <- mean(fig1[treat == "Control", aba])
t_p <- numeric(n_samples)
sd_a_n <- numeric(n_samples)
for(i in 1:n_samples){
c_n <- sample(fig1[treat == "Control", aba])
h_n <- sample(fig1[treat == "HAC", aba])
i_n <- sample(fig1[treat == "Indole", aba])
a_n <- h_n + i_n - c_av # the author's test
# the "synergism test"
t_p[i] <- t.test(a_n, hi_n, var.equal = TRUE)$p.value
sd_a_n[i] <- sd(a_n)
}
qplot(t_p)
Let’s make a function to do the same analysis on each response
lm_coef <- function(y = "aba"){
form <- formula(paste0(y, " ~ hac*indole"))
fit <- lm(form, data = fig1)
fit_coef <- cbind(coef(summary(fit)),
confint(fit))
return(fit_coef)
}
y <- "opda"
knitr::kable(lm_coef(y),
digits = c(1,2,2,3,1,1),
caption = "Coefficient table") %>%
kable_styling(bootstrap_options = "striped",
full_width = FALSE,
position = "center")
Estimate | Std. Error | t value | Pr(>|t|) | 2.5 % | 97.5 % | |
---|---|---|---|---|---|---|
(Intercept) | 240.1 | 27.05 | 8.88 | 0.000 | 182.8 | 297.5 |
hacHAC+ | 75.0 | 38.25 | 1.96 | 0.068 | -6.1 | 156.1 |
indoleIndole+ | 67.4 | 38.25 | 1.76 | 0.097 | -13.7 | 148.5 |
hacHAC+:indoleIndole+ | 66.6 | 54.10 | 1.23 | 0.236 | -48.1 | 181.3 |
y <- "ja"
knitr::kable(lm_coef(y),
digits = c(1,2,2,3,1,1),
caption = "Coefficient table") %>%
kable_styling(bootstrap_options = "striped",
full_width = FALSE,
position = "center")
Estimate | Std. Error | t value | Pr(>|t|) | 2.5 % | 97.5 % | |
---|---|---|---|---|---|---|
(Intercept) | 43.9 | 4.88 | 8.99 | 0.000 | 33.5 | 54.2 |
hacHAC+ | 15.6 | 6.90 | 2.26 | 0.038 | 1.0 | 30.2 |
indoleIndole+ | 13.1 | 6.90 | 1.90 | 0.076 | -1.5 | 27.7 |
hacHAC+:indoleIndole+ | 29.8 | 9.76 | 3.05 | 0.008 | 9.1 | 50.5 |
y <- "ja_ile"
knitr::kable(lm_coef(y),
digits = c(1,2,2,3,1,1),
caption = "Coefficient table") %>%
kable_styling(bootstrap_options = "striped",
full_width = FALSE,
position = "center")
Estimate | Std. Error | t value | Pr(>|t|) | 2.5 % | 97.5 % | |
---|---|---|---|---|---|---|
(Intercept) | 114.1 | 11.44 | 9.98 | 0.000 | 89.9 | 138.4 |
hacHAC+ | 42.5 | 16.18 | 2.63 | 0.018 | 8.2 | 76.8 |
indoleIndole+ | 39.6 | 16.18 | 2.45 | 0.026 | 5.3 | 73.9 |
hacHAC+:indoleIndole+ | 53.1 | 22.88 | 2.32 | 0.034 | 4.6 | 101.6 |
y <- "sa"
knitr::kable(lm_coef(y),
digits = c(1,2,2,3,1,1),
caption = "Coefficient table") %>%
kable_styling(bootstrap_options = "striped",
full_width = FALSE,
position = "center")
Estimate | Std. Error | t value | Pr(>|t|) | 2.5 % | 97.5 % | |
---|---|---|---|---|---|---|
(Intercept) | 648.3 | 49.46 | 13.11 | 0.000 | 543.5 | 753.2 |
hacHAC+ | -95.0 | 69.94 | -1.36 | 0.193 | -243.2 | 53.3 |
indoleIndole+ | 2.6 | 69.94 | 0.04 | 0.971 | -145.6 | 150.9 |
hacHAC+:indoleIndole+ | 31.9 | 98.91 | 0.32 | 0.752 | -177.8 | 241.5 |
y <- "aba"
knitr::kable(lm_coef(y),
digits = c(1,2,2,3,1,1),
caption = "Coefficient table") %>%
kable_styling(bootstrap_options = "striped",
full_width = FALSE,
position = "center")
Estimate | Std. Error | t value | Pr(>|t|) | 2.5 % | 97.5 % | |
---|---|---|---|---|---|---|
(Intercept) | 19.2 | 3.23 | 5.94 | 0.000 | 12.3 | 26.0 |
hacHAC+ | 13.3 | 4.56 | 2.91 | 0.010 | 3.6 | 23.0 |
indoleIndole+ | 8.8 | 4.56 | 1.92 | 0.072 | -0.9 | 18.5 |
hacHAC+:indoleIndole+ | 12.8 | 6.45 | 1.99 | 0.064 | -0.9 | 26.5 |
y <- "zm_pr1"
knitr::kable(lm_coef(y),
digits = c(1,2,2,3,1,1),
caption = "Coefficient table") %>%
kable_styling(bootstrap_options = "striped",
full_width = FALSE,
position = "center")
Estimate | Std. Error | t value | Pr(>|t|) | 2.5 % | 97.5 % | |
---|---|---|---|---|---|---|
(Intercept) | 1.2 | 0.27 | 4.44 | 0.000 | 0.6 | 1.8 |
hacHAC+ | -0.1 | 0.38 | -0.25 | 0.809 | -0.9 | 0.7 |
indoleIndole+ | -0.4 | 0.38 | -1.00 | 0.334 | -1.2 | 0.4 |
hacHAC+:indoleIndole+ | 0.2 | 0.53 | 0.40 | 0.694 | -0.9 | 1.3 |
y <- "zm_pr5"
knitr::kable(lm_coef(y),
digits = c(1,2,2,3,1,1),
caption = "Coefficient table") %>%
kable_styling(bootstrap_options = "striped",
full_width = FALSE,
position = "center")
Estimate | Std. Error | t value | Pr(>|t|) | 2.5 % | 97.5 % | |
---|---|---|---|---|---|---|
(Intercept) | 1.1 | 0.23 | 4.84 | 0.000 | 0.6 | 1.6 |
hacHAC+ | 0.0 | 0.32 | 0.01 | 0.992 | -0.7 | 0.7 |
indoleIndole+ | 0.0 | 0.32 | 0.01 | 0.996 | -0.7 | 0.7 |
hacHAC+:indoleIndole+ | -0.2 | 0.46 | -0.37 | 0.714 | -1.1 | 0.8 |
y <- "zm_lox"
knitr::kable(lm_coef(y),
digits = c(1,2,2,3,1,1),
caption = "Coefficient table") %>%
kable_styling(bootstrap_options = "striped",
full_width = FALSE,
position = "center")
Estimate | Std. Error | t value | Pr(>|t|) | 2.5 % | 97.5 % | |
---|---|---|---|---|---|---|
(Intercept) | 1.0 | 0.14 | 7.30 | 0.000 | 0.7 | 1.3 |
hacHAC+ | 0.7 | 0.20 | 3.33 | 0.004 | 0.2 | 1.1 |
indoleIndole+ | 0.6 | 0.20 | 3.11 | 0.007 | 0.2 | 1.0 |
hacHAC+:indoleIndole+ | 0.5 | 0.28 | 1.89 | 0.077 | -0.1 | 1.1 |
y <- "zm_aos"
knitr::kable(lm_coef(y),
digits = c(1,2,2,3,1,1),
caption = "Coefficient table") %>%
kable_styling(bootstrap_options = "striped",
full_width = FALSE,
position = "center")
Estimate | Std. Error | t value | Pr(>|t|) | 2.5 % | 97.5 % | |
---|---|---|---|---|---|---|
(Intercept) | 1.1 | 0.32 | 3.35 | 0.004 | 0.4 | 1.7 |
hacHAC+ | 1.0 | 0.45 | 2.33 | 0.033 | 0.1 | 2.0 |
indoleIndole+ | 2.1 | 0.45 | 4.75 | 0.000 | 1.2 | 3.1 |
hacHAC+:indoleIndole+ | 1.7 | 0.63 | 2.74 | 0.015 | 0.4 | 3.1 |