When phylogenetic assumptions are violated: base compositional heterogeneity and among‐site rate variation in beetle mitochondrial phylogenomics Academic Article uri icon


  • The ability to generate large molecular datasets for phylogenetic studies benefits biologists, but such data expansion introduces numerous analytical problems. A typical molecular phylogenetic study implicitly assumes that sequences evolve under stationary, reversible and homogeneous conditions, but this assumption is often violated in real datasets. When an analysis of large molecular datasets results in unexpected relationships, it often reflects violation of phylogenetic assumptions, rather than a correct phylogeny. Molecular evolutionary phenomena such as base compositional heterogeneity and among-site rate variation are known to affect phylogenetic inference, resulting in incorrect phylogenetic relationships. The ability of methods to overcome such bias has not been measured on real and complex datasets. We investigated how base compositional heterogeneity and among-site rate variation affect phylogenetic inference in the context of a mitochondrial genome phylogeny of the insect order Coleoptera. We show statistically that our dataset is affected by base compositional heterogeneity regardless of how the data are partitioned or recoded. Among-site rate variation is shown by comparing topologies generated using models of evolution with and without a rate variation parameter in a Bayesian framework. When compared for their effectiveness in dealing with systematic bias, standard phylogenetic methods tend to perform poorly, and parsimony without any data transformation performs worst. Two methods designed specifically to overcome systematic bias, LogDet and a Bayesian method implementing variable composition vectors, can overcome some level of base compositional heterogeneity, but are still affected by among-site rate variation. A large degree of variation in both noise and phylogenetic signal among all three codon positions is observed. We caution and argue that more data exploration is imperative, especially when many genes are included in an analysis. © 2010 The Authors. Journal compilation. © 2010 The Royal Entomological Society.

author list (cited authors)


citation count

  • 101

publication date

  • July 2010