# What is the bias in the estimation of an effect given an omitted interaction term?

Some background (due to Sewall Wright’s method of path analysis) Given a generating model: $\begin{equation} y = \beta_0 + \beta_1 x_1 + \beta_2 x_2 + \beta_3 x_3 \end{equation}$ where $$x_3 = x_1 x_2$$; that is, it is an interaction variable. The total effect of $$x_1$$ on $$y$$ is $$\beta_1 + \frac{\mathrm{COV}(x_1, x_2)}{\mathrm{VAR}(x_1)} \beta_2 + \frac{\mathrm{COV}(x_1, x_3)}{\mathrm{VAR}(x_1)} \beta_3$$. If $$x_3$$ (the interaction) is missing, its component on the total efffect is added to the coefficient of $$x_1$$.

# Should we be skeptical of a "large" effect size if p > 0.05?

Motivator: A twitter comment “Isn’t the implication that the large effect size is a direct byproduct of the lack of power? i.e. that if the the study had more power, the effect size would have been found to be smaller.”1 2 A thought: our belief in the magnitude of an observed effect should be based on our priors, which, hopefully, are formed from good mechanistic models and not sample size“.3

# Reporting effects as relative differences...with a confidence interval

Researchers frequently report results as relative effects, for example, “Male flies from selected lines had 50% larger upwind flight ability than male flies from control lines (Control mean: 117.5 cm/s; Selected mean 176.5 cm/s).” where a relative effect is $\begin{equation} 100 \frac{\bar{y}_B - \bar{y}_A}{\bar{y}_A} \end{equation}$ If we are to follow best practices, we should present this effect with a measure of uncertainty, such as a confidence interval. The absolute effect is 59.

# Combining data, distribution summary, model effects, and uncertainty in a single plot

A Harrell plot combines a forest plot of estimated treatment effects and uncertainty, a dot plot of raw data, and a box plot of the distribution of the raw data into a single plot. A Harrell plot encourages best practices such as exploration of the distribution of the data and focus on effect size and uncertainty, while discouraging bad practices such as ignoring distributions and focusing on $$p$$-values. Consequently, a Harrell plot should replace the bar plots and Cleveland dot plots that are currently ubiquitous in the literature. #### R doodles. Some ecology. Some physiology. Much fake data.

Thoughts on R, statistical best practices, and teaching applied statistics to Biology majors.

Jeff Walker, Professor of Biological Sciences

University of Southern Maine, Portland, Maine, United States