# What is the bias in the estimation of an effect given an omitted interaction term?

Some background (due to Sewall Wright’s method of path analysis) Given a generating model: $\begin{equation} y = \beta_0 + \beta_1 x_1 + \beta_2 x_2 + \beta_3 x_3 \end{equation}$ where $$x_3 = x_1 x_2$$; that is, it is an interaction variable. The total effect of $$x_1$$ on $$y$$ is $$\beta_1 + \frac{\mathrm{COV}(x_1, x_2)}{\mathrm{VAR}(x_1)} \beta_2 + \frac{\mathrm{COV}(x_1, x_3)}{\mathrm{VAR}(x_1)} \beta_3$$. If $$x_3$$ (the interaction) is missing, its component on the total efffect is added to the coefficient of $$x_1$$.

# Expected covariances in a causal network

This is a skeleton post Standardized variables (Wright’s rules) n <- 10^5 # z is the common cause of g1 and g2 z <- rnorm(n) # effects of z on g1 and g2 b1 <- 0.7 b2 <- 0.7 r12 <- b1*b2 g1 <- b1*z + sqrt(1-b1^2)*rnorm(n) g2 <- b2*z + sqrt(1-b2^2)*rnorm(n) var(g1) # E(VAR(g1)) = 1 ##  1.001849 var(g2) # E(VAR(g2)) = 1 ##  1.006102 cor(g1, g2) # E(COR(g1,g2)) = b1*b2 ##  0. #### R doodles. Some ecology. Some physiology. Much fake data.

Thoughts on R, statistical best practices, and teaching applied statistics to Biology majors.

Jeff Walker, Professor of Biological Sciences

University of Southern Maine, Portland, Maine, United States