This is a very quick post as a comment to the statement

“For linear models, predicting from a parameter-averaged model is mathematically identical to averaging predictions, but this is not the case for non-linear models…For non-linear models, such as GLMs with log or logit link functions g(x)1, such coefficient averaging is not equivalent to prediction averaging.”

from the supplement of Dormann et al. Model averaging in ecology: a review of Bayesian, information‐theoretic and tactical approaches for predictive inference.

This is within the context of problems with model-averaged coefficients. I think the authors are arguing that for GLMs, we cannot use model-averaged predictions to justify interpreting model-averaged coefficients since modeled-averaged coefficients are not the same thing that produced the model-averaged predictions. I think this is wrong, model-averaged coefficients are the same thing that produced the model-averaged predictions if both are averaged on the link scale.

Here is their equation S2

\[\begin{equation} \frac{1}{m} \sum_{i=1}^m{g^{-1}(Xb_i)} \ne g^{-1}(X \frac{\sum_{i=1}^m{b_i}}{m}) \end{equation}\]

This equation is true but, I think, irrelevant. The LHS averages predictions, but the average is on the response scale. The RHS averages the coefficients, but the average is on the link scale. I would agree with the statement if this is the way researchers are averaging (although I would think naive researchers would be averaging coefficients on the response scale and predictions on the link scale)

I would think the LHS is the incorrect method for computing predictions on the response scale. With all averaging on the link scale, equation S2 becomes

\[\begin{equation} g^{-1}(\frac{1}{m} \sum_{i=1}^m{(Xb_i)}) = g^{-1}(X \frac{\sum_{i=1}^m{b_i}}{m}) \end{equation}\]

The RHS is the same as their S2. The LHS computes the prediction for each model on the link scale, then averages over these on the link scale, and then back-transforms to the response scale. Both LHS and RHS are correct ways to get the model-averaged predictions on the response scale. And, both the LHS and RHS are equivalent and show that “predicting from a parameter-averaged model [RHS] is mathematically identical to averaging predictions [LHS].”

In short. Model-averaged coefficients should be averaged on the link scale. It would indeed be not “correct” to interpret coefficients that are model-averaged on the response scale, although as Dormann et al state, “In practice, however, both log and logit are sufficiently linear, making coefficient averaging an acceptable approximation.” If the issue is that researchers are model averaging coefficients on the response scale, this isn’t an issue with model averaging coefficients in general, but only of model-averaging coefficients on a response scale.

Here is a short R-doodle showing that “predicting from a parameter-averaged model is mathematically identical to averaging predictions” for a GLM, as long as one is doing all the averaging on the link scale

library(MuMIn)
library(ggplot2)
library(data.table)
expit <- function(x) {exp(x)/(1+exp(x))} # the inverse logit function. This generates the probability of the event p
logit <- function(p) {log(p/(1-p))} # the log of the odds or "logodds" given the probability of an event p. This is NOT the odds ratio, which is the ratio of two odds.
p2odd <- function(p) {p/(1-p)} # the odds of the probability of an event
odd2p <- function(x) {x/(1+x)} # the probability associated with an odds
n <- 100
z <- rnorm(n)
# create two correlated variables, x1 and x2, with E[cor] = zeta^2
zeta <- 0.7
sigma <- 0.3
x1 <- zeta*z + sqrt(1-zeta^2)*rnorm(n)
x2 <- zeta*z + sqrt(1-zeta^2)*rnorm(n)

# create a performance measure as function of x1 and x2
perf <- x1 + x2 + rnorm(n)*sigma # coefficients both = 1

# transform performance to probability of survival

# create fake data
p.survival <- expit(perf)
y <- rbinom(n, 1, p.survival)
dt <- data.table(y=y,x1=x1,x2=x2)

# fit
fit <- glm(y ~ x1 + x2, data=dt, family=binomial(link='logit'), na.action=na.fail)

# all subsets regression and model average using MuMIn
fit.all <- dredge(fit)
## Fixed term is "(Intercept)"
fit.avg <- model.avg(fit.all, fit=TRUE) # coeffcients are on link scale

model_set <- get.models(fit.all, subset=TRUE) # all models
X <- model.matrix(fit)

# (0) MuMIn predict
yhat0.response_scale <- predict(fit.avg, backtransform=TRUE)

# RHS eq. S2
# verify "by hand" by predicting on link scale then back transforming to response scale
yhat0.link_scale <- predict(fit.avg, backtransform=FALSE)
yhat0.response_scale2 <- expit(yhat0.link_scale) 
head(data.table(yhat0.response_scale, yhat0.response_scale2)) # these should be equal
##    yhat0.response_scale yhat0.response_scale2
## 1:            0.1465589             0.1465589
## 2:            0.1698607             0.1698607
## 3:            0.7997893             0.7997893
## 4:            0.2284307             0.2284307
## 5:            0.4392947             0.4392947
## 6:            0.2359691             0.2359691
yhat0 <- yhat0.response_scale # predictions using MuMIn package


# (1) use model averaged B to get prediction on link scale. back-transform to response scale
# this is RHS eq. S2 RHS of appendix
# I use MuMIn model.avg function to model average coefficients on link scale
# then I compute predictions on link scale
# then I back-transform predictions to response scale 
# This should equal yhat0 from above
b <- model.avg(model_set)$coefficients['full',][colnames(X)]
yhat1.link_scale <- X%*%b
yhat1 <- expit(yhat1.link_scale)
MSE1 <- sqrt(mean((yhat1 - dt[, y])^2))

# (2) a variant of yhat1 and yhat0 - I am "by hand" computing the average prediction on the link scale
# then back-transforming to response scale
# this can be thought of as the corrected LHS of S2
w <- fit.all$weight
yhat2.each_model.link_scale <- sapply(model_set, predict)
yhat2.link_scale <- yhat2.each_model.link_scale%*%w
yhat2 <- expit(yhat2.link_scale)
MSE2 <- sqrt(mean((yhat2 - dt[, y])^2))

# (3) Thisis the "incorrect" method of model averaging"
# LHS of S2
# model average predictions on response scale (i.e. back-transform each prediction to response scale and then model average)
# I need the first two calculations from #(2) above to get yhat2.each_model.link_scale
yhat3.each_model.response_scale <- expit(yhat2.each_model.link_scale)
yhat3 <- yhat3.each_model.response_scale%*%w
MSE3 <- sqrt(mean((yhat3-dt[, y])^2))

# Predicted values computed 4 different ways
head(data.table(yhat0=yhat0, yhat1=yhat1[,1], yhat2=yhat2[,1], yhat3=yhat3[,1]))
##        yhat0     yhat1     yhat2     yhat3
## 1: 0.1465589 0.1465589 0.1465589 0.1481774
## 2: 0.1698607 0.1698607 0.1698607 0.1703643
## 3: 0.7997893 0.7997893 0.7997893 0.7966822
## 4: 0.2284307 0.2284307 0.2284307 0.2293946
## 5: 0.4392947 0.4392947 0.4392947 0.4393425
## 6: 0.2359691 0.2359691 0.2359691 0.2367279
#yhat0 = MuMIn model averaged predictions (correct method)
#yhat1 = RHS of equation s2 in appendix (correct method)
#yhat2 = Corrected LHS of s2 in appendix (correct method)
#yhat3 = LHS of s2 in appendix (incorrect?)

  1. understood but awkward and confusing. It is unconventional to call a GLM a non-linear model, especially given the name “General Linear Model”