Copyright © Jeffrey A. Walker

Back to R doodles

motivating source: Integration of two herbivore-induced plant volatiles results in synergistic effects on plant defense and resistance

tags: factorial design, interaction, synergism, two-way anova, linear model

1 What is synergism or antagonism?

(this post is a follow up to What is an interaction?)

In the experiment for Figure 1 of the motivating source article, the authors compared the defense response to exposure to four different combinations of volatile compounds. These four combinations are:

  1. Control – the plants were not exposed to either volatile compound.
  2. HAC – the plants were exposed to the volatile compound (Z)‐3‐hexenyl acetate (HAC).
  3. Indole – the plants were exposed to the volatile compound Indole.
  4. HAC+Indole – the plants were exposed to both HAC and Indole.

The design is a fully crossed, \(2 \times 2\) factorial experiment – there are two factors (categorical \(X\) variables) each with two levels.

  1. Factor 1: hac with levels “HAC-” and “HAC+”
  2. Factor 2: indole with levels “Indole-” and “Indole+”

The researchers were explicitly interested in measuring any synergistic effects of hac and indole on the response. What is a synergistic effect? If hac and indole act independently, then the response should be additive – the HAC+Indole effect should simply be the sum of the independent HAC and Indole effects. A HAC+Indole effect that is larger than the sum of the two independent effects is evidence of a synergistic (positive) interaction. A HAC+Indole effect that is less than the sum of the two independent effects is evidence of an antagonistic (negative) interaction.

Synergism or antagonism are very easy to estimate given a fully factorial design – these are simply the interaction coefficient from the linear model using dummy coding for the indicator variables. The researchers didn’t use this coefficient to infer synergism but instead computed the effect manually and then invented a test to compute a p-value. I see something like this pretty regularly in the experimental biology literature, which makes me think that many biology researchers are not familiar with factorial linear models to estimate interaction effects and what an interaction effect is. They are probably not familiar with factorial linear models and the meaning of the coefficients because this isn’t taught. Instead, 2-way ANOVA is.

So, let’s fit a factorial linear model and show that the coefficient of the interaction is the measure of synergism (or antagonism).

2 Setup and Import

knitr::opts_chunk$set(echo = TRUE, message=FALSE, warning=FALSE)
library(here)
library(janitor)
library(readxl)
library(data.table)

# graphics and tables
library(ggplot2)
library(kableExtra)

# analysis packages
library(lmPerm)

here <- here::here
data_folder <- "content/data"
data_from <- "Integration of two herbivore-induced plant volatiles results in synergistic effects on plant defense and resistance"
file_name <- "Data for Dryad.xlsx"
file_path <- here(data_folder, data_from, file_name)

fig1 <- read_excel(file_path,
                     sheet = "Fig. 1",
                     range = "A3:K23", # 1 blank column
                     col_names = TRUE) %>%
  data.table() %>%
  clean_names()

# create proper treatment variables for each factor
fig1[treat == "Control", c("hac", "indole") := list("HAC-", "Indole-")]
fig1[treat == "HAC", c("hac", "indole") := list("HAC+", "Indole-")]
fig1[treat == "Indole", c("hac", "indole") := list("HAC-", "Indole+")]
fig1[treat == "HAC+Indole", c("hac", "indole") := list("HAC+", "Indole+")]

treat_levels <- c("Control", "HAC", "Indole", "HAC+Indole")
fig1[, treat := factor(treat, levels = treat_levels)]
fig1[, hac := factor(hac)] # levels in correct order
fig1[, indole := factor(indole)] # levels in correct order
# View(fig1)

The archived Excel file contains only a single factor variable (treat) with the \(2 \times 2\) experimental treatment combinations flattened into a single \(4 \times 1\) grouping variable. The script above creates the two experimental treatment factors hac and indole.

3 Analysis

I’ll use the data in Figure 1a of the source article as the example. This figure shows the tissue concentration of the defense hormone abcisic acid (aba) in response to the four treatment combinations.

m1 <- lm(aba ~ hac*indole, data = fig1)
m1_coef <- cbind(coef(summary(m1)),
                 confint(m1))
m1_table <- knitr::kable(m1_coef,
             digits = c(1,2,2,3,1,1),
             caption = "Coefficient table")
m1_table %>%
  kable_styling(bootstrap_options = "striped",
                full_width = FALSE,
                position = "center")
Table 1: Coefficient table
Estimate Std. Error t value Pr(>|t|) 2.5 % 97.5 %
(Intercept) 19.2 3.23 5.94 0.000 12.3 26.0
hacHAC+ 13.3 4.56 2.91 0.010 3.6 23.0
indoleIndole+ 8.8 4.56 1.92 0.072 -0.9 18.5
hacHAC+:indoleIndole+ 12.8 6.45 1.99 0.064 -0.9 26.5

The interaction effect on ABA is 12.8 (95% CI: -0.9, 26.5) ng per g FW. I’m not a plant defense expert so I don’t know what is a big vs. small effect. Regardless, a synergistic effect as big as the single factor effects (HAC and Indole) is consistent with the data. An additive effect is less consistent. A negative (antagonistic) effect, at least one of any reasonable magnitude, is much less consistent with the data.

3.1 A note on the p-value

The p-value for the interaction effect is 0.064. The researchers compare p-values to 0.05 throughout. If we are making this comparison to control Type I error rates (and why else would be compare to a bright line?) then the actual p-value doesn’t really matter. There is no “marginally signficant” or “trending toward significance”. We simply accept that we failed to find evidence against the null (and additive model) and move on.

3.2 Another note on the p-value.

The researchers used the FDR to adjust p-values. First, for each response, did the researchers include their test for synergism in the family? I suspect not, but also, how would one even properly include it given that the t-value for this test is not independent of the post-hoc tests among groups. Second, what is the “family” here? There are multiple (nine) responses for the same experiment – I would include all of these in the same family (were I to use a multiple testing adjustment, which I probably wouldn’t). This is 6 post-hoc tests plus 1 synergism test times 9 responses, or 63 tests.

4 The researcher’s computation of synergism is the interaction coefficient of the fit linear model

Potential synergism was evaluated using a previously described approach (Machado et al., 2018). Briefly, we calculated additive effects by randomly pairing replicates of individual volatile treatments (an indole treated plant [\(I_n\)] and a HAC treated plant [\(H_n\)]). For each random pair, we calculated theoretical additive values (\(A_n\)) for the different defence parameters using the following formula: \(A_n = I_n + H_n − C_{av}\), where \(C_{av}\) corresponds to the average level of nonexposed control plants. The calculated additive values were then compared with the measured treatment values of the double volatile treatment using Student’s t tests. Cases in which the measured level of the double volatile treatment was significantly greater than the calculated additive level were classified as synergistic.

I only saw less, not more, detail in Machodo et al. 2018.

The researchers method of computing the additive and synergistic effects is simply the underlying math of the interaction effect if the indicator variables are dummy coded. Using the researchers notation, the expected additive response is \(C_{av} + (H_{av} - C_{av}) + (I_{av} - C_{av})\). This reduces to the researcher’s equation if we substitute a sampled pair with \(H_{av}\) and \(I_{av}\) but it is also equal to \(b_0 + b_1 + b_2\), since

  1. \(b_0 = C_{av}\) is the intercept of the fit model ((Intercept) in the table)
  2. \(b_1 = H_{av} - C_{av}\) is the coefficient of hac (hacHAC+ in the table).
  3. \(b_2 = I_{av} - C_{av}\) is the coefficient of indole (indoleIndole+ in the table)

The interaction effect (the coefficient \(b_3\)) is

  1. \(b_3 = HI_{av} - (b_0 + b_1 + b_2)\) is the coefficient of the interaction (hacHAC+:indoleIndole+ in the table)

where \(HI_n\) is the mean of the “HAC+Indole” group. This simply shows that the coefficient \(b_3\) is the “synergistic effect” computed by the researchers. And, there is no need to do this computation since it is computed for us if we use dummy (treatment) coding, which is the default contrast coding in R but not most other software.

The researchers in this paper aren’t the first to compute the synergistic (interaction) effect like this. As I said above, I see something like this pretty regularly in the experimental biology literature, which makes me think that many biology researchers are not familiar with factorial linear models to estimate interaction effects and what an interaction effect is. They are probably not familiar with factorial linear models and the meaning of the coefficients because this isn’t taught. Instead, 2-way ANOVA is.

5 The researcher’s test has a weird dependence on randomness

This is my understanding of the test invented by the researchers

hi_n <- fig1[treat == "HAC+Indole", aba]
c_av <- mean(fig1[treat == "Control", aba])
c_n <- sample(fig1[treat == "Control", aba])
h_n <- sample(fig1[treat == "HAC", aba])
i_n <- sample(fig1[treat == "Indole", aba])
a_n <- h_n + i_n - c_av # the author's test
a_n_2 <- h_n + i_n - c_n # why not also sample control?

# the "synergism test"
t.test(a_n, hi_n, var.equal = TRUE)$p.value
## [1] 0.1228597
# using the sampled control
t.test(a_n_2, hi_n, var.equal = TRUE)$p.value
## [1] 0.127285

This is highly dependent on the random sample. The mean of \(a_n\) is the same regardless of the random pairing, but the sd differs with each random pairing. This means the t and p-value are dependent on which pairs are sampled. all resampling methods have a stochastic component but not in this sense. If I understand the method, there is no way to minimize the size of the stochastic component as in something like the bootstrap (where the stochastic component is decreaesed by resampling more).

Let’s make a function out of this and look at the variation in the p-value

n_samples <- 100
hi_n <- fig1[treat == "HAC+Indole", aba]
c_av <- mean(fig1[treat == "Control", aba])

t_p <- numeric(n_samples)
sd_a_n <- numeric(n_samples)
for(i in 1:n_samples){
  c_n <- sample(fig1[treat == "Control", aba])
  h_n <- sample(fig1[treat == "HAC", aba])
  i_n <- sample(fig1[treat == "Indole", aba])
  a_n <- h_n + i_n - c_av # the author's test
  # the "synergism test"
  t_p[i] <- t.test(a_n, hi_n, var.equal = TRUE)$p.value
  sd_a_n[i] <- sd(a_n)
}
qplot(t_p)
A histogram of the p-values from the different combinations of sampled pairs used to compute the expected additive effect

Figure 1: A histogram of the p-values from the different combinations of sampled pairs used to compute the expected additive effect

6 Analysis of all responses

Let’s make a function to do the same analysis on each response

lm_coef <- function(y = "aba"){
  form <- formula(paste0(y, " ~ hac*indole"))
  fit <- lm(form, data = fig1)
  fit_coef <- cbind(coef(summary(fit)),
                   confint(fit))
  return(fit_coef)
}

6.1 opda

y <- "opda"
knitr::kable(lm_coef(y),
             digits = c(1,2,2,3,1,1),
             caption = "Coefficient table") %>%
  kable_styling(bootstrap_options = "striped",
                full_width = FALSE,
                position = "center")
Table 2: Coefficient table
Estimate Std. Error t value Pr(>|t|) 2.5 % 97.5 %
(Intercept) 240.1 27.05 8.88 0.000 182.8 297.5
hacHAC+ 75.0 38.25 1.96 0.068 -6.1 156.1
indoleIndole+ 67.4 38.25 1.76 0.097 -13.7 148.5
hacHAC+:indoleIndole+ 66.6 54.10 1.23 0.236 -48.1 181.3

6.2 ja

y <- "ja"
knitr::kable(lm_coef(y),
             digits = c(1,2,2,3,1,1),
             caption = "Coefficient table") %>%
  kable_styling(bootstrap_options = "striped",
                full_width = FALSE,
                position = "center")
Table 3: Coefficient table
Estimate Std. Error t value Pr(>|t|) 2.5 % 97.5 %
(Intercept) 43.9 4.88 8.99 0.000 33.5 54.2
hacHAC+ 15.6 6.90 2.26 0.038 1.0 30.2
indoleIndole+ 13.1 6.90 1.90 0.076 -1.5 27.7
hacHAC+:indoleIndole+ 29.8 9.76 3.05 0.008 9.1 50.5

6.3 ja_ile

y <- "ja_ile"
knitr::kable(lm_coef(y),
             digits = c(1,2,2,3,1,1),
             caption = "Coefficient table") %>%
  kable_styling(bootstrap_options = "striped",
                full_width = FALSE,
                position = "center")
Table 4: Coefficient table
Estimate Std. Error t value Pr(>|t|) 2.5 % 97.5 %
(Intercept) 114.1 11.44 9.98 0.000 89.9 138.4
hacHAC+ 42.5 16.18 2.63 0.018 8.2 76.8
indoleIndole+ 39.6 16.18 2.45 0.026 5.3 73.9
hacHAC+:indoleIndole+ 53.1 22.88 2.32 0.034 4.6 101.6

6.4 sa

y <- "sa"
knitr::kable(lm_coef(y),
             digits = c(1,2,2,3,1,1),
             caption = "Coefficient table") %>%
  kable_styling(bootstrap_options = "striped",
                full_width = FALSE,
                position = "center")
Table 5: Coefficient table
Estimate Std. Error t value Pr(>|t|) 2.5 % 97.5 %
(Intercept) 648.3 49.46 13.11 0.000 543.5 753.2
hacHAC+ -95.0 69.94 -1.36 0.193 -243.2 53.3
indoleIndole+ 2.6 69.94 0.04 0.971 -145.6 150.9
hacHAC+:indoleIndole+ 31.9 98.91 0.32 0.752 -177.8 241.5

6.5 aba

y <- "aba"
knitr::kable(lm_coef(y),
             digits = c(1,2,2,3,1,1),
             caption = "Coefficient table") %>%
  kable_styling(bootstrap_options = "striped",
                full_width = FALSE,
                position = "center")
Table 6: Coefficient table
Estimate Std. Error t value Pr(>|t|) 2.5 % 97.5 %
(Intercept) 19.2 3.23 5.94 0.000 12.3 26.0
hacHAC+ 13.3 4.56 2.91 0.010 3.6 23.0
indoleIndole+ 8.8 4.56 1.92 0.072 -0.9 18.5
hacHAC+:indoleIndole+ 12.8 6.45 1.99 0.064 -0.9 26.5

6.6 zm_pr1

y <- "zm_pr1"
knitr::kable(lm_coef(y),
             digits = c(1,2,2,3,1,1),
             caption = "Coefficient table") %>%
  kable_styling(bootstrap_options = "striped",
                full_width = FALSE,
                position = "center")
Table 7: Coefficient table
Estimate Std. Error t value Pr(>|t|) 2.5 % 97.5 %
(Intercept) 1.2 0.27 4.44 0.000 0.6 1.8
hacHAC+ -0.1 0.38 -0.25 0.809 -0.9 0.7
indoleIndole+ -0.4 0.38 -1.00 0.334 -1.2 0.4
hacHAC+:indoleIndole+ 0.2 0.53 0.40 0.694 -0.9 1.3

6.7 zm_pr5

y <- "zm_pr5"
knitr::kable(lm_coef(y),
             digits = c(1,2,2,3,1,1),
             caption = "Coefficient table") %>%
  kable_styling(bootstrap_options = "striped",
                full_width = FALSE,
                position = "center")
Table 8: Coefficient table
Estimate Std. Error t value Pr(>|t|) 2.5 % 97.5 %
(Intercept) 1.1 0.23 4.84 0.000 0.6 1.6
hacHAC+ 0.0 0.32 0.01 0.992 -0.7 0.7
indoleIndole+ 0.0 0.32 0.01 0.996 -0.7 0.7
hacHAC+:indoleIndole+ -0.2 0.46 -0.37 0.714 -1.1 0.8

6.8 zm_lox

y <- "zm_lox"
knitr::kable(lm_coef(y),
             digits = c(1,2,2,3,1,1),
             caption = "Coefficient table") %>%
  kable_styling(bootstrap_options = "striped",
                full_width = FALSE,
                position = "center")
Table 9: Coefficient table
Estimate Std. Error t value Pr(>|t|) 2.5 % 97.5 %
(Intercept) 1.0 0.14 7.30 0.000 0.7 1.3
hacHAC+ 0.7 0.20 3.33 0.004 0.2 1.1
indoleIndole+ 0.6 0.20 3.11 0.007 0.2 1.0
hacHAC+:indoleIndole+ 0.5 0.28 1.89 0.077 -0.1 1.1

6.9 zm_aos

y <- "zm_aos"
knitr::kable(lm_coef(y),
             digits = c(1,2,2,3,1,1),
             caption = "Coefficient table") %>%
  kable_styling(bootstrap_options = "striped",
                full_width = FALSE,
                position = "center")
Table 10: Coefficient table
Estimate Std. Error t value Pr(>|t|) 2.5 % 97.5 %
(Intercept) 1.1 0.32 3.35 0.004 0.4 1.7
hacHAC+ 1.0 0.45 2.33 0.033 0.1 2.0
indoleIndole+ 2.1 0.45 4.75 0.000 1.2 3.1
hacHAC+:indoleIndole+ 1.7 0.63 2.74 0.015 0.4 3.1