"Nested" random factors in mixed (multilevel or hierarchical) models
This is a skeletal post to show the equivalency of different ways of thinking about “nested” factors in a mixed model. The data are measures of life history traits in lice that infect salmon. The treatment is the source of lice – from farmed raised or wild salmon. The response variable analyzed here is the date of production of the egg string in units of “days post-infection”. A second fixed-factor is the reproductive event number (the lice went through five successive rounds of reproduction following infection). The salmon were raised in 30 tanks in each of two rooms. There were multiple measures for each tank, so there is non-independence within tanks. There were also differences in setup between rooms so we expect non-independence within rooms. There are two ways to think about this non-independence
- the classical “nested” way of thinking: tanks is “nested within” room.
- the lme4 (Bates xxx) way of thinking: the single nested effect is decomposed into two random effects: room and a factor of the combinations of tanks and rooms.
Source data Data from: Invest more and die faster: the life history of a parasite on intensive farms
Setup
library(janitor)
library(data.table)
library(lmerTest)
library(here)
here <- here::here
folder <- "Data from Invest more and die faster - the life history of a parasite on intensive farms"
Import
fn <- "timing_ES1to5.txt"
file_path <- here('data',folder, fn)
file_path <- paste('../data', folder, fn, sep="/")
timing <- clean_names(fread(file_path))
setnames(timing, old="exp", new="room")
Models
I follow the authors of the source paper and use a generalized linear mixed model with Poisson family and log-link. The column “tank” contains the tank ID 1-30 in each room. As a consequence a tank with id “6” is not unique but occurs in both room 1 and room 2. The column “tank2” has a unique name for every tank. The value is simply the combination of “room” and “tank”.
as nested using “tank” nested within “room”
m2 <- glmer(days_pi ~ status*es + (1|room/tank), family = "poisson", data = timing)
coef(summary(m2))
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 3.981916916 0.072950431 54.583871 0.00000000
## statuswild -0.037870115 0.018418902 -2.056046 0.03977806
## es 0.135835163 0.003406919 39.870382 0.00000000
## statuswild:es 0.006370614 0.005207202 1.223424 0.22116972
as two random intercepts (using lme4 to create the combinations)
m3 <- glmer(days_pi ~ status*es + (1|room) + (1|room:tank), family = "poisson", data = timing)
coef(summary(m3))
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 3.981916916 0.072950431 54.583871 0.00000000
## statuswild -0.037870115 0.018418902 -2.056046 0.03977806
## es 0.135835163 0.003406919 39.870382 0.00000000
## statuswild:es 0.006370614 0.005207202 1.223424 0.22116972
A safer (lme4) way to create the combinations of “room” and “tank”:
The room:tank
specification may create combinations that do not exist in the data. A safer way to do this is to create a column that combines “room” and “tank” so that each tank has a unique name. This is the same concept as “tank2” in the original data.
# to emphasize that this only works if room and tank are factors
timing[, room:=factor(room)]
timing[, tank:=factor(tank)]
timing[, tank_id := factor(room:tank)] # same concept as "tank2"
m4 <- glmer(days_pi ~ status*es + (1|room) + (1|tank_id), family = "poisson", data = timing)
coef(summary(m4))
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 3.981916916 0.072950431 54.583871 0.00000000
## statuswild -0.037870115 0.018418902 -2.056046 0.03977806
## es 0.135835163 0.003406919 39.870382 0.00000000
## statuswild:es 0.006370614 0.005207202 1.223424 0.22116972
as two random intercepts using “tank2”
“tank2” is the author’s version of the combination, created using something like paste0(room,tank)
. This is effectively like the “safe” way to create the combinations above.
m1 <- glmer(days_pi ~ status*es + (1|room) + (1|tank2), family = "poisson", data = timing)
coef(summary(m1))
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 3.981916916 0.072950431 54.583871 0.00000000
## statuswild -0.037870115 0.018418902 -2.056046 0.03977806
## es 0.135835163 0.003406919 39.870382 0.00000000
## statuswild:es 0.006370614 0.005207202 1.223424 0.22116972
Don’t do this
This doesn’t work because the column “tank” contains the same set of tank IDs in each room, that is, tank ID is not unique. This implies that tank “2” is the same in both rooms or that “2” has the “same” meaning.
m5 <- glmer(days_pi ~ status*es + (1|room) + (1|tank), family = "poisson", data = timing)
coef(summary(m5))
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 3.981048359 0.073559994 54.119748 0.00000000
## statuswild -0.035657150 0.018364165 -1.941670 0.05217705
## es 0.135696317 0.003407198 39.826363 0.00000000
## statuswild:es 0.006427319 0.005201244 1.235727 0.21655993