# Expected covariances in a causal network

This is a skeleton post Standardized variables (Wright’s rules) n <- 10^5 # z is the common cause of g1 and g2 z <- rnorm(n) # effects of z on g1 and g2 b1 <- 0.7 b2 <- 0.7 r12 <- b1*b2 g1 <- b1*z + sqrt(1-b1^2)*rnorm(n) g2 <- b2*z + sqrt(1-b2^2)*rnorm(n) var(g1) # E(VAR(g1)) = 1 ## [1] 0.9970283 var(g2) # E(VAR(g2)) = 1 ## [1] 0.9986488 cor(g1, g2) # E(COR(g1,g2)) = b1*b2 ## [1] 0.

# Compute a random data matrix (fake data) without rmvnorm

This is a skeleton post until I have time to flesh it out. The post is motivated by a question on twitter about creating fake data that has a covariance matrix that simulates a known (given) covariance matrix that has one or more negative (or zero) eigenvalues. First, some libraries library(data.table) library(mvtnorm) library(MASS) Second, some functions… random.sign <- function(u){ # this is fastest of three out <- sign(runif(u)-0.5) #randomly draws from {-1,1} with probability of each = 0.

# Reporting effects as relative differences...with a confidence interval

Researchers frequently report results as relative effects, for example, “Male flies from selected lines had 50% larger upwind flight ability than male flies from control lines (Control mean: 117.5 cm/s; Selected mean 176.5 cm/s).” where a relative effect is $$$100 \frac{\bar{y}_B - \bar{y}_A}{\bar{y}_A}$$$ If we are to follow best practices, we should present this effect with a measure of uncertainty, such as a confidence interval. The absolute effect is 59.

# Interaction plots with ggplot2

ggpubr is a fantastic resource for teaching applied biostats because it makes ggplot a bit easier for students. I’m not super familiar with all that ggpubr can do, but I’m not sure it includes a good “interaction plot” function. Maybe I’m wrong. But if I’m not, here is a simple function to create a gg_interaction plot. The gg_interaction function returns a ggplot of the modeled means and standard errors and not the raw means and standard errors computed from each group independently.

# Textbook error 101 -- A low p-value for the full model does not mean that the model is a good predictor of the response

On page 606, of Lock et al “Statistics: Unlocking the Power of Data”, the authors state in item D “The p-value from the ANOVA table is 0.000 so the model as a whole is effective at predicting grade point average.” Ah no. library(data.table) library(mvtnorm) rho <- 0.5 n <- 10^5 Sigma <- diag(2) Sigma[1,2] <- Sigma[2,1] <- rho X <- rmvnorm(n, mean=c(0,0), sigma=Sigma) colnames(X) <- c("X1", "X2") beta <- c(0.01, -0.

# A simple ggplot of some measure against depth

set up The goal is to plot the measure of something, say O2 levels, against depth (soil or lake), with the measures taken on multiple days library(ggplot2) library(data.table) First – create fake data depths <- c(0, seq(10,100, by=10)) dates <- c("Jan-18", "Mar-18", "May-18", "Jul-18") x <- expand.grid(date=dates, depth=depths) n <- nrow(x) head(x) ## date depth ## 1 Jan-18 0 ## 2 Mar-18 0 ## 3 May-18 0 ## 4 Jul-18 0 ## 5 Jan-18 10 ## 6 Mar-18 10 X <- model.

# Should the model-averaged prediction be computed on the link or response scale in a GLM?

[updated to include additional output from MuMIn, BMA, and BAS] This post is a follow up to my inital post, which was written as as a way for me to pen my mental thoughts on the recent review of “Model averaging in ecology: a review of Bayesian, information‐theoretic and tactical approaches for predictive inference”. It was also written without contacting and discussing the issue with the authors. This post benefits from a series of e-mails with the lead author Carsten Dormann and the last author Florian Hartig.
• page 1 of 3

#### R doodles. Some ecology. Some physiology. Much fake data.

Thoughts on R, statistical best practices, and teaching applied statistics to Biology majors

Jeff Walker, Professor of Biological Sciences

University of Southern Maine, Portland, Maine, United States