This doodle was motivated Jake Westfall’s answer to a Cross-Validated question.

The short answer is yes but most R scripts that I’ve found on the web are unsatisfying because only the t-value reproduces, not the df and p-value. Jake notes the reason for this in his answer on Cross-Validated.

To get the adjusted df, and the p-value associated with this, one can use the emmeans package by Russell Lenth, as he notes here. Compare Tables 1 and 2 below to see how this adjustment can matter.

Setup

Libraries

library(data.table)
library(emmeans)
library(nlme)

Functions

reglance <- function(x){
  if(class(x)=="htest"){
    t <- x$statistic
    df <- x$parameter
    p <- x$p.value
  }
   if(class(x)=="lm"){
    df <- x$df.residual
    t <- coef(summary(x))["groupB", "t value"]
    p <- coef(summary(x))["groupB", "Pr(>|t|)"]
  }
 if(class(x)=="gls"){
    N <- summary(x)$dims$N
    k <- 2
    df <- N - k
    t <- summary(x)$tTable["groupB", "t-value"]
    p <- summary(x)$tTable["groupB", "p-value"]
  }
  if(class(x)=="emmGrid"){
    df <- summary(x)[1, "df"]
    t <- summary(x)[1, "t.ratio"]
    p <- summary(x)[1, "p.value"]
  }
  return(c(df=df, t=t, p=p))
}

Motivating fake data

This is Jake’s script, except that I’ve assigned the model objects and use these to make a table. I’ve also included the emmeans functions to generate the t-test using the gls model output.

set.seed(497203)
dat <- data.frame(group=rep.int(c("A","B"), c(10,20)),
  y = rnorm(30, mean=rep.int(c(0,1), c(10,20)), sd=rep.int(c(1,2),c(10,20))))

# the t-statistic assuming equal variances
t.student <- t.test(y ~ group, data = dat, var.equal = TRUE)
m1 <- lm(y ~ group, data = dat)

# the t-statistic not assuming equal variances
t.welch <- t.test(y ~ group, data = dat, var.equal = FALSE)
m2 <- gls(y ~ group, data = dat, 
          weights=varIdent(form = ~ 1 | group))

m2.emm <- contrast(emmeans(m2, specs="group"))

t_table <- data.table(rbind(
  c("Student t", reglance(t.student)),
  c("lm", reglance(m1)),
  c("Welch t", reglance(t.welch)),
  c("gls", reglance(m2)),
  c("emmeans", reglance(m2.emm))
))
colnames(t_table) <- c("method", "df", "t", "p")
knitr::kable(t_table, caption="Original fake data")
Table 1: Original fake data
method df t p
Student t 28 -1.55068268697165 0.132208346462522
lm 28 1.55068268697165 0.132208346462522
Welch t 27.6771984386631 -1.87241001836005 0.0717506959077031
gls 28 1.87241001838309 0.0716278864212684
emmeans 27.6856538252465 -1.87241001838309 0.071747442401355

The values aren’t rounded because I want to see how close the gls and Welch t are. As Jake notes, the GLS model doesn’t use the Satterthwaite df, so only the t reproduces. That said, in this example, the non-adjusted and Satterthwaite df are very close. Consequently, in this example, the difference in p-values are trivial with respect to how we might interpret a result. The emmeans computation of the df and p-value are not equal but are satisfyingly close to those of the Welch t, at least for me, and at least in this example.

Fake data with small n

Here I repeat Jake’s example but with a smaller sample and I reverse which sample is associated with the larger variance.

set.seed(497203)
n1 <- 8
n2 <- 4
dat <- data.frame(group=rep.int(c("A","B"), c(n1,n2)),
  y = rnorm(n1+n2, mean=rep.int(c(0,1), c(n1,n2)), sd=rep.int(c(1,2),c(n1,n2))))

# the t-statistic assuming equal variances
t.student <- t.test(y ~ group, data = dat, var.equal = TRUE)
m1 <- lm(y ~ group, data = dat)

# the t-statistic not assuming equal variances
t.welch <- t.test(y ~ group, data = dat, var.equal = FALSE)
m2 <- gls(y ~ group, data = dat, 
          weights=varIdent(form = ~ 1 | group))

m2.emm <- contrast(emmeans(m2, specs="group"))

t_table <- data.table(rbind(
  c("Student t", reglance(t.student)),
  c("lm", reglance(m1)),
  c("Welch t", reglance(t.welch)),
  c("gls", reglance(m2)),
  c("emmeans", reglance(m2.emm))
))
colnames(t_table) <- c("method", "df", "t", "p")
knitr::kable(t_table, caption="Small n fake data")
Table 2: Small n fake data
method df t p
Student t 10 -2.3012090076821 0.0441633525165716
lm 10 2.3012090076821 0.0441633525165716
Welch t 3.91598345952776 -1.87308830595972 0.135881711655436
gls 10 1.87308831515667 0.0905471567272453
emmeans 3.9158589862009 -1.87308831515667 0.13588402534431

In this example, the lack of the Satterthwaite adjustment of the df matters. The emmeans computation of the df and p-value is very satisfying.