Some background (due to Sewall Wright’s method of path analysis)

Given a generating model:

y = β_{0} + β_{1} x_{1} + β_{2} x_{2} + β_{3} x_{3}

where $x_{3} = x_{1} x_{2}$ ; that is, it is an interaction variable.

The total effect of $x_{1}$ on $y$ is $β_{1} + \frac{C O V (x_{1}, x_{2})}{V A R (x_{1})} β_{2} + \frac{C O V (x_{1}, x_{3})}{V A R (x_{1})} β_{3}$ .

If $x_{3}$ (the interaction) is missing, its component on the total efffect is added to the coefficient of $x_{1}$ . This added component is the bias due to the omitted interaction.

Definitions

This computation assumes an interaction effect is defined as the “leftover” residual after adding up the intercept plus additive effects – that is, the coefficient estimated using dummy coding (which is R’s default coding). See Is the power to test an interaction effect less than that for a main effect? for other definitions/parameterizations of an interaction.

Libraries

I use data.table

library(data.table)

Factorial design with ommitted interaction

Let’s simulate this with a $2 \times 2$ factorial design. In a balanced fatorial design, $\frac{C O V (x_{1}, x_{3})}{V A R (x_{1})} = 0.5$ , so the bias is $\frac{β_{3}}{2}$

n <- 10^6 # a high n is used so that the estimate is very close to the true expected value
beta_0 <- 0
beta_1 <- 1
beta_2 <- 0.5
beta_3 <- 0.5
beta <- c(beta_0, beta_1, beta_2, beta_3)
sigma <- 1

fd <- data.table(
  x1=factor(rep(c("CnA", "TrA"), each=2*n)),
  x2=factor(rep(rep(c("CnB", "TrB"), each=n), 2))
)
X <- model.matrix(~x1*x2, data=fd)
cov_X <- cov(X)
fd[, y:=(X%*%beta)[,1] + rnorm(n*4)]

The bias is

b <- cov_X[4,2]/cov_X[2,2] # this will be 0.5 in balanced design
b*beta_3

## [1] 0.25

The expected coefficient given the known true effect plus the bias is

beta_1 + b*beta_3

## [1] 1.25

The coefficient of $x_{1}$ in the linear model with the missing interaction estimates this biased effect.

coef(lm(y ~ x1 + x2, data=fd))[2]

##    x1TrA 
## 1.248929

Two continuous $X$ with interaction

n <- 10^6
# generate two correlated x
rho <- 0.6 # true correlation between the two x variables
z <- rnorm(n)
sigma_x <- sqrt(1-rho)
x1 <- sqrt(rho)*z + rnorm(n, sd=sigma_x) # expected variance is 1
x2 <- sqrt(rho)*z + rnorm(n, sd=sigma_x) # expected variance is 1
x3 <- x1*x2

beta_0 <- 0
beta_1 <- 1
beta_2 <- 0.5
beta_3 <- 0.5
beta <- c(beta_0, beta_1, beta_2, beta_3)
sigma <- 1

fd <- data.table(
  x1=x1,
  x2=x2
)
X <- model.matrix(~x1*x2, data=fd)
cov_X <- cov(X)
fd[, y:=(X%*%beta)[,1] + rnorm(n)]

The bias is

b <- cov_X[4,2]/cov_X[2,2] # this will be 0.5 in balanced design
b*beta_3

## [1] -0.001261114

huh. Is this generally the case that the regression of the interaction on the main variable is near zero? If it is, then an omitted interaction would have little bias, as here.

The expected coefficient given the known true effect plus the bias is

beta_1 + b*beta_3

## [1] 0.9987389

The coefficient of $x_{1}$ in the linear model with the missing interaction estimates this biased effect.

coef(lm(y ~ x1 + x2, data=fd))[2]

##        x1 
## 0.9966902

What is the bias in the estimation of an effect given an omitted interaction term?

Some background (due to Sewall Wright’s method of path analysis)

Definitions

Libraries

Factorial design with ommitted interaction

Two continuous $X$ with interaction

R doodles. Some ecology. Some physiology. Much fake data.

What is the bias in the estimation of an effect given an omitted interaction term?

Some background (due to Sewall Wright’s method of path analysis)

Definitions

Libraries

Factorial design with ommitted interaction

Two continuous XX with interaction

R doodles. Some ecology. Some physiology. Much fake data.

How to make plots with factor levels below the x-axis (bench-biology style)

What is an interaction?

How to estimate synergism or antagonism

Type 3 ANOVA in R -- an easy way to publish wrong tables

Linear models with a covariate ("ANCOVA")

Normal Q-Q plots - what is the robust line and should we prefer it?

ANCOVA when the covariate is a mediator affected by treatment

Bootstrap confidence intervals when sample size is really small

What is the consequence of normalizing by each case in the control?

Melting a list of columns

Two continuous $X$ with interaction