Overdispersion tests in #rstats

A brief note on overdispersion

Assumptions

Poisson distribution assume variance is equal to the mean.

Quasi-poisson model assumes variance is a linear function of mean.

Negative binomial model assumes variance is a quadratic function of the mean.

rstats implementation

#to test you need to fit a poisson GLM then apply function to this model

library(AER)

dispersiontest(object, trafo = NULL, alternative = c(“greater”, “two.sided”, “less”))

trafo = 1 is linear testing for quasipoisson or you can fit linear equation to trafo as well

#interpretation

c = 0 equidispersion

c > 0 is overdispersed

Resources

  1. Function description from vignette for AER package.
  2. Excellent StatsExchange description of interpretation.

A note on AIC scores for quasi-families in #rstats

A summary note on recent set of #rstats discoveries in estimating AIC scores to better understand a quasipoisson family in GLMS relative to treating data as poisson.

Conceptual GLM workflow rules/guidelines

  1. Data are best untransformed. Fit better model to data.
  2. Select your data structure to match purpose with statistical model.
  3. Use logic and understanding of data not AIC scores to select best model.

(1) Typically, the power and flexibility of GLMs in R (even with base R) get most of the work done for the ecological data we work with within the research team. We prefer to leave data untransformed and simple when possible and use the family or offset arguments within GLMs to address data issues.

(2) Data structure is a new concept to us. We have come to appreciate that there are both individual and population-level queries associated with many of the datasets we have collected.  For our purposes, data structure is defined as the level that the dplyr::group_by to tally or count frequencies is applied. If the ecological purpose of the experiment was defined as the population response to a treatment for instance, the population becomes the sample unit – not the individual organism – and summarised as such. It is critical to match the structure of data wrangled to the purpose of the experiment to be able to fit appropriate models. Higher-order data structures can reduce the likelihood of nested, oversampled, or pseudoreplicated model fitting.

(3) Know thy data and experiment. It is easy to get lost in model fitting and dive deep into unduly complex models. There are tools before model fitting that can prime you for better, more elegant model fits.

Workflow

  1. Wrangle then data viz.
  2. Library(fitdistrplus) to explore distributions.
  3. Select data structure.
  4. Fit models.

Now, specific to topic of AIC scores for quasi-family field studies.

We recently selected quasipoisson for the family to model frequency and count data (for individual-data structures). This addressed overdispersion issues within the data. AIC scores are best used for understanding prediction not description, and logic and exploration of distributions, CDF plots, and examination of the deviance (i.e. not be more than double the degrees of freedom) framed the data and model contexts. To contrast poisson to quasipoisson for prediction, i.e. would the animals respond differently to the treatments/factors within the experiment, we used the following #rstats solutions.

————

#Functions####

#deviance calc

dfun <- function(object) {

with(object,sum((weights * residuals^2)[weights > 0])/df.residual)

}

#reuses AIC from poisson family estimation

x.quasipoisson <- function(…) {

res <- quasipoisson(…)

res$aic <- poisson(…)$aic

res

}

#AIC package that provided most intuitive solution set####

require(MuMIn)

m <- update(m,family=”x.quasipoisson”, na.action=na.fail)

m1 <- dredge(m,rank=”QAIC”, chat=dfun(m))

m1

#repeat as needed to contrast different models

————

Outcomes

This #rstats opportunity generated a lot of positive discussion on data structures, how we use AIC scores, and how to estimate fit for at least this quasi-family model set in as few lines of code as possible.

Resources

  1. An R vignette by Ben Bolker of quasi solutions.
  2. An Ecology article on quasi-possion versus nb.regression for overdispersed count data.
  3. A StatsExchange discussion on AIC scores.

Fundamentals

Same data, different structure, lead to different models. Quasipoisson a reasonable solution for overdispersed count and frequency animal ecology data. AIC scores are a bit of work, but not extensive code, to extract. AIC scores provide a useful insight into predictive capacities if the purpose is individual-level prediction of count/frequency to treatments.

 

Why read a book review when you you can read the book (for free via #oa #openscience)?

 

Reviews, recommendations, and ratings are an important component of contemporary online consumption. Rotten Tomatoes, Metacritic, and Amazon.com reviews and recommendations increasingly shape decisions. Science and technical books are no exception. Increasingly, I have checked reviews for a technical book on a purchasing site even before I downloaded the free book. Too much information, not too little informs many of the competing learning opportunities (#rstats ) for instance).  I used to check the book reviews section in journals and enjoyed reading them (even if I never read the book). My reading habits have changed now, and I rarely read sections from journals and focus only on target papers. This is an unfortunate. I recognize that reviews are important for many science and technical products (not just for books but packages, tools, and approaches). Here is my brief listicle for why reviews are important  for science books and tools.

benefit description
curation Reviews (reviewed) and published in journals engender trust and weight critique to some extent.
developments and rate of change A book review typically frames the topic and offering of a book/tool in the progress of the science.
deeper dive into topic The review usually speaks to a specific audience and helps one decide on fit with needs.
highlights The strengths and limitations of offering are described and can point out pitfalls.
insights and implications Sometimes the implications and meaning of a book or tool is not described directly. Reviews can provide.
independent comment Critics are infamous. In science, the opportunity to offer praise is uncommon and reviews can provide balance.
fits offering into specific scientific subdiscpline Technical books can get lost bceause of the silo effect in the sciences. Reviews can connect disciplines.

Here is an estimate of the frequency of publication of book reviews in some of the journals I read regularly.

journal total.reviews recent
American Naturalist 12967 9
Conservation Biology 1327 74
Journal of Applied Ecology 270 28
Journal of Ecology 182 0
Methods in Ecology & Evolution 81 19
Oikos 211 22

Details of journal data scrape here: https://cjlortie.github.io/book.reviews/

 

A good novel tells us the truth about its hero; but a bad novel tells us the truth about its author.

–Gilbert K. Chesterton

Politics versus ecological and environmental research: invitation to submit to Ideas in Ecology and Evolution #resist

Dear Colleagues,

Ideas in Ecology and Evolution would like to invite you to submit to a special issue entitled
‘Politics versus ecological and environmental research.’

The contemporary political climate has dramatically changed in some nations. Global change marches on, and changes within each and every country influence everyone. We need to march too and can do so in many ways. There has been extensive social media discussion and political activity within the scientific community. One particularly compelling discussion is best captured by this paraphrased exchange.

“Keep politics out of my science feeds.”
“I will keep politics out of my science when politics keeps out of science.”

The latter context has never existed, but the extent of intervention, falsification by non-scientists, blatant non-truths, and threat to science have never been greater in contemporary ecology and environmental science in particular.

Ideas in Ecology and Evolution is an open-access journal. We view the niche of this journal as a home for topics that need discussing for our discipline. Ideas are a beautiful opportunity sometimes lost by the file-drawer problem, and this journal welcomes papers without data to propose new ideas and critically comment on issues relevant to our field both directly and indirectly. Lonnie Aarssen and I are keen to capture some of the ongoing discussion and #resist efforts by our peers. We will rapidly secure two reviews for your contributions to get ideas into print now.

We welcome submissions that address any aspect of politics and ecology and the environment. The papers can include any of (but not limited to) the following formats: commentaries, solution sets, critiques, novel mindsets, strategies to better link ecology/environmental science to political discourse, analyses of political interventions, summaries of developments, and mini-reviews that highlight ecological/environmental science that clearly support an alternative decision.

Please submit contributions using the Open Journal System site here.

Warm regards,

Chris Lortie and Lonnie Aarssen.

A rule-of-thumb for chi-squared tests in systematic reviews

Rule

A chi-squared test with few observations is not a super powerful statistical test (note, apparently termed both chi-square and chi-squared test depending on the discipline and source). Nonetheless, this test useful in systematic reviews to confirm whether observed patterns in the frequency of study of a particular dimension for a topic are statistically different (at least according to about 4/10 referees I have encountered). Not as a vote-counting tool but as a means for the referees and readers of the review to assess whether the counts of approaches, places, species, or some measure used in set of primary studies differed. The mistaken rule-of-thumb is that <5 counts per cell violates the assumptions of chi-squared test. However, this intriguing post reminds that it is not the observed value but the expected value that must be at least 5 (blog post on topic and statistical article describing assumption). I propose that this a reasonable and logical rule-of-thumb for some forms of scientific synthesis such as systematic reviews exploring patterns of research within a set of studies – not the strength of evidence or effect sizes.

An appropriate rule-of-thumb for when you should report a chi-squared test statistic in a systematic review is thus as follows.

When doing a systematic review that includes quantitative summaries of frequencies of various study dimensions, the total sample size of studies summarized (dividend) divided by the potential number of differences in the specific level tested (divisor) should be at least 5 (quotient). You are simply calculating whether the expected values can even reach 5 given your set of studies and the categorical analysis of the frequency of a specific study dimension for the study set applied during your review process.

total number of studies/number of levels contrasted for specific study set dimension >= 5

[In R, I used nrow(main dataframe)/nrow(frequency dataframe for dimension); however, it was a bit clunky. You could use the ‘length’ function or write a new function and use a ‘for loop’ for all factors you are likely to test].

Statistical assumptions aside, it is also reasonable to propose that a practical rule-of-thumb for literature syntheses (systematic reviews and meta-analyses) requires at least 5 studies completed that test each specific level of the factor or attribute summarized.

Example

For example, my colleagues and I were recently doing a systematic review that captured a total of 49 independent primary studies (GitHub repo). We wanted to report frequencies that the specific topic differed in how it was tested by the specific hypothesis (as listed by primary authors), and there were a total of 7 different hypotheses tested within this set of studies.  The division rule-of-thumb for statistical reporting in a review was applied, 49/7 = 7, so we elected to report a chi-squared test in the Results of the manuscript.  Other interesting dimensions of study for the topic had many more levels such as country of study or taxa and violated this rule. In these instances, we simply reported the frequencies in the Results that these aspects were studied without supporting statistics (or we used much simpler classification strategies). A systematic review is a form of formalized synthesis in ecology, and these syntheses typically do not include effect size measure estimates in ecology (other disciplines use the term systematic review interchangeably with meta-analysis, we do not do so in ecology). For these more descriptive review formats, this rule seems appropriate for describing differences in the synthesis of a set studies topologically, i.e. summarizing information about the set of studies, like the meta-data of the data but not the primary data (here is the GitHub repo we used for the specific systematic review that lead to this rule for our team). This fuzzy rule lead to a more interesting general insight. An overly detailed approach to the synthesis of a set of studies likely defeats the purpose of the synthesis.