This is a skeletal post to show the equivalency of different ways of thinking about “nested” factors in a mixed model. The data are measures of life history traits in lice that infect salmon. The treatment is the source of lice – from farmed raised or wild salmon. The response variable analyzed here is the date of production of the egg string in units of “days post-infection”. A second fixed-factor is the reproductive event number (the lice went through five successive rounds of reproduction following infection). The salmon were raised in 30 tanks in each of two rooms. There were multiple measures for each tank, so there is non-independence within tanks. There were also differences in setup between rooms so we expect non-independence within rooms. There are two ways to think about this non-independence

the classical “nested” way of thinking: tanks is “nested within” room.
the lme4 (Bates xxx) way of thinking: the single nested effect is decomposed into two random effects: room and a factor of the combinations of tanks and rooms.

Source article Mennerat, A., Ugelvik, M.S., Håkonsrud Jensen, C. and Skorping, A., 2017. Invest more and die faster: The life history of a parasite on intensive farms. Evolutionary applications, 10(9), pp.890-896.

Source data Data from: Invest more and die faster: the life history of a parasite on intensive farms

Setup

library(janitor)
library(data.table)
library(lmerTest)
library(here)
here <- here::here

folder <- "Data from Invest more and die faster - the life history of a parasite on intensive farms"

Import

fn <- "timing_ES1to5.txt"
file_path <- here('data',folder, fn)
file_path <- paste('../data', folder, fn, sep="/")
timing <- clean_names(fread(file_path))
setnames(timing, old="exp", new="room")

Models

I follow the authors of the source paper and use a generalized linear mixed model with Poisson family and log-link. The column “tank” contains the tank ID 1-30 in each room. As a consequence a tank with id “6” is not unique but occurs in both room 1 and room 2. The column “tank2” has a unique name for every tank. The value is simply the combination of “room” and “tank”.

as nested using “tank” nested within “room”

m2 <- glmer(days_pi ~ status*es + (1|room/tank), family = "poisson", data = timing)
coef(summary(m2))

##                   Estimate  Std. Error   z value   Pr(>|z|)
## (Intercept)    3.981916916 0.072950431 54.583871 0.00000000
## statuswild    -0.037870115 0.018418902 -2.056046 0.03977806
## es             0.135835163 0.003406919 39.870382 0.00000000
## statuswild:es  0.006370614 0.005207202  1.223424 0.22116972

as two random intercepts (using lme4 to create the combinations)

m3 <- glmer(days_pi ~ status*es + (1|room) + (1|room:tank), family = "poisson", data = timing)
coef(summary(m3))

##                   Estimate  Std. Error   z value   Pr(>|z|)
## (Intercept)    3.981916916 0.072950431 54.583871 0.00000000
## statuswild    -0.037870115 0.018418902 -2.056046 0.03977806
## es             0.135835163 0.003406919 39.870382 0.00000000
## statuswild:es  0.006370614 0.005207202  1.223424 0.22116972

A safer (lme4) way to create the combinations of “room” and “tank”:

The room:tank specification may create combinations that do not exist in the data. A safer way to do this is to create a column that combines “room” and “tank” so that each tank has a unique name. This is the same concept as “tank2” in the original data.

# to emphasize that this only works if room and tank are factors
timing[, room:=factor(room)]
timing[, tank:=factor(tank)]
timing[, tank_id := factor(room:tank)] # same concept as "tank2"
m4 <- glmer(days_pi ~ status*es + (1|room) + (1|tank_id), family = "poisson", data = timing)
coef(summary(m4))

##                   Estimate  Std. Error   z value   Pr(>|z|)
## (Intercept)    3.981916916 0.072950431 54.583871 0.00000000
## statuswild    -0.037870115 0.018418902 -2.056046 0.03977806
## es             0.135835163 0.003406919 39.870382 0.00000000
## statuswild:es  0.006370614 0.005207202  1.223424 0.22116972

as two random intercepts using “tank2”

“tank2” is the author’s version of the combination, created using something like paste0(room,tank). This is effectively like the “safe” way to create the combinations above.

m1 <- glmer(days_pi ~ status*es + (1|room) + (1|tank2), family = "poisson", data = timing)
coef(summary(m1))

##                   Estimate  Std. Error   z value   Pr(>|z|)
## (Intercept)    3.981916916 0.072950431 54.583871 0.00000000
## statuswild    -0.037870115 0.018418902 -2.056046 0.03977806
## es             0.135835163 0.003406919 39.870382 0.00000000
## statuswild:es  0.006370614 0.005207202  1.223424 0.22116972

Don’t do this

This doesn’t work because the column “tank” contains the same set of tank IDs in each room, that is, tank ID is not unique. This implies that tank “2” is the same in both rooms or that “2” has the “same” meaning.

m5 <- glmer(days_pi ~ status*es + (1|room) + (1|tank), family = "poisson", data = timing)
coef(summary(m5))

##                   Estimate  Std. Error   z value   Pr(>|z|)
## (Intercept)    3.981048359 0.073559994 54.119748 0.00000000
## statuswild    -0.035657150 0.018364165 -1.941670 0.05217705
## es             0.135696317 0.003407198 39.826363 0.00000000
## statuswild:es  0.006427319 0.005201244  1.235727 0.21655993

"Nested" random factors in mixed (multilevel or hierarchical) models

Setup

Import

Models

as nested using “tank” nested within “room”

as two random intercepts (using lme4 to create the combinations)

A safer (lme4) way to create the combinations of “room” and “tank”:

as two random intercepts using “tank2”

Don’t do this

R doodles. Some ecology. Some physiology. Much fake data.

"Nested" random factors in mixed (multilevel or hierarchical) models

Setup

Import

Models

as nested using “tank” nested within “room”

as two random intercepts (using lme4 to create the combinations)

A safer (lme4) way to create the combinations of “room” and “tank”:

as two random intercepts using “tank2”

Don’t do this

R doodles. Some ecology. Some physiology. Much fake data.

How to make plots with factor levels below the x-axis (bench-biology style)

What is an interaction?

How to estimate synergism or antagonism

Type 3 ANOVA in R -- an easy way to publish wrong tables

Linear models with a covariate ("ANCOVA")

Normal Q-Q plots - what is the robust line and should we prefer it?

ANCOVA when the covariate is a mediator affected by treatment

Bootstrap confidence intervals when sample size is really small

What is the consequence of normalizing by each case in the control?

Melting a list of columns