What is an R doodle?

An R doodle is a short script to check intuition or understanding. Almost always, this involves generating fake data. I might create an R doodle when I’m reviewing a manuscript or reading a published paper and I want to check if their statistical analysis is doing what the authors think it is doing. Or maybe I create it to help me figure out what the authors are doing. Or I might be teaching some method and I create an R doodle to help me understand how the method behaves given different input (fake) data sets. Or I might create an R doodle to help others understand - this could be for students, or some blog comment, or my department colleagues.

Why this blog?

My R doodles are reactionary - I read something and start coding in R studio. I have an organized folder of these on my Google drive. Some I have expanded into published papers (here, here, and here) but I have a file drawer full of effectively completed manuscripts that are unlikely to be submitted. Writing R doodles, and especially the manuscripts, consumes a large fraction of my professional time, but others don’t really gain from this hard work unless I publish the results. This blog was started to archive my R doodles going forward. Some of these will be expanded into longer scripts and posts. Some of these will be fleshed out into manuscripts for archiving at PeerJ preprints or bioR\(\chi\)iv(https://www.biorxiv.org). Others will be expanded into Shiny apps. Some I might even submit for publication in a journal. But ultimately, the goal is to have an archive for teaching and learning.

Caveats

My statistics training was classical biostatistics – both Robert Sokal and James Rohlf were my principal mentors during my PhD years. But, I have zero training in Math, Mathematical statistics, or Computer Science and this will be evident in the way I think and write. Statisticians frequently comment on the muddled thinking found in statistics textbooks written by Biologists (or non-statisticians generally). That said, it is pretty easy to find muddled thinking in textbooks written by statisticians and even easier to find muddled thinking in biology papers with statisticians on the author list. Statistical thinking just isn’t that easy.

What is an R doodle?

Why this blog?

Caveats

R doodles. Some ecology. Some physiology. Much fake data.

How to make plots with factor levels below the x-axis (bench-biology style)

What is an interaction?

How to estimate synergism or antagonism

Type 3 ANOVA in R -- an easy way to publish wrong tables

Linear models with a covariate ("ANCOVA")

Normal Q-Q plots - what is the robust line and should we prefer it?

ANCOVA when the covariate is a mediator affected by treatment

Bootstrap confidence intervals when sample size is really small

What is the consequence of normalizing by each case in the control?

Melting a list of columns