1932

Abstract

Many fields of evolutionary biology now depend on stochastic mathematical models. These models are valuable for their ability to formalize predictions in the face of uncertainty and provide a quantitative framework for testing hypotheses. However, no mathematical model will fully capture biological complexity. Instead, these models attempt to capture the important features of biological systems using relatively simple mathematical principles. These simplifications can allow us to focus on differences that are meaningful, while ignoring those that are not. However, simplification also requires assumptions, and to the extent that these are wrong, so is our ability to predict or compare. Here, we discuss approaches for evaluating the performance of evolutionary models in light of their assumptions by comparing them against reality. We highlight general approaches, how they are applied, and remaining opportunities. Absolute tests of fit, even when not explicitly framed as such, are fundamental to progress in understanding evolution.

Loading

Article metrics loading...

/content/journals/10.1146/annurev-ecolsys-110617-062249
2018-11-02
2024-04-20
Loading full text...

Full text loading...

/deliver/fulltext/es/49/1/annurev-ecolsys-110617-062249.html?itemId=/content/journals/10.1146/annurev-ecolsys-110617-062249&mimeType=html&fmt=ahah

Literature Cited

  1. Barley AJ, Brown JM, Thomson RC 2017. Impact of model violations on the inference of species boundaries under the multispecies coalescent. Syst. Biol. 67:269–84
    [Google Scholar]
  2. Barley AJ, Thomson RC 2016. Assessing the performance of DNA barcoding using posterior predictive simulations. Mol. Ecol. 25:1944–57
    [Google Scholar]
  3. Bayarri MJ, Berger JO 2004. The interplay of Bayesian and frequentist analysis. Stat. Sci. 19:58–80
    [Google Scholar]
  4. Beaulieu JM, O'Meara BC 2016. Detecting hidden diversification shifts in models of trait-dependent speciation and extinction. Syst. Biol. 65:583–601
    [Google Scholar]
  5. Beaulieu JM, O'Meara BC, Donoghue MJ 2013. Identifying hidden rate changes in the evolution of a binary morphological character: the evolution of plant habit in campanulid angiosperms. Syst. Biol. 62:725–37
    [Google Scholar]
  6. Blum MGB, François O 2006. Which random processes describe the tree of life? A large-scale study of phylogenetic tree imbalance. Syst. Biol. 55:685–91
    [Google Scholar]
  7. Bollback JP. 2002. Bayesian model adequacy and choice in phylogenetics. Mol. Biol. Evol. 19:1171–80
    [Google Scholar]
  8. Brown JM. 2014. Detection of implausible phylogenetic inferences using posterior predictive assessment of model fit. Syst. Biol. 63:334–48
    [Google Scholar]
  9. Brown JM, ElDabaje R 2009. PuMA: Bayesian analysis of partitioned (and unpartitioned) model adequacy. Bioinformatics 25:537–38
    [Google Scholar]
  10. Brown JM, Lemmon AR 2007. The importance of data partitioning and the utility of Bayes factors in Bayesian phylogenetics. Syst. Biol. 56:643–55
    [Google Scholar]
  11. Brown JM, Thomson RC 2017. Bayes factors unmask highly variable information content, bias, and extreme influence in phylogenomic analyses. Syst. Biol. 66:517–30
    [Google Scholar]
  12. Caetano DS, O'Meara BC, Beaulieu JM 2018. Hidden state models improve the adequacy of state-dependent diversification approaches using empirical trees, including biogeographical models. bioRxiv 302729. https://doi.org/10.1101/302729
    [Crossref]
  13. Castoe TA, de Koning APJ, Kim H-M, Gu W, Noonan BP et al. 2009. Evidence for an ancient adaptive episode of convergent molecular evolution. PNAS 106:8986–91
    [Google Scholar]
  14. Chen M-H, Shao Q-M, Ibrahim JG 2000. Monte Carlo Methods in Bayesian Computation New York: Springer
  15. Cox DR. 1961. Tests of separate families of hypotheses. Proc. Fourth Berkeley Symp. Math. Stat. Probab105–23 Berkeley: Univ. Calif. Press
    [Google Scholar]
  16. Darwin C. 1859. On the Origin of Species by Means of Natural Selection, Or, the Preservation of Favoured Races in the Struggle for Life London: John Murray
  17. Darwin F 1887. The Life and Letters of Charles Darwin, Including an Autobiographical Chapter, Vol. 1 London: John Murray
  18. Doyle VP, Young RE, Naylor GJP, Brown JM 2015. Can we identify genes with increased phylogenetic reliability. Syst. Biol. 64:824–37
    [Google Scholar]
  19. Duchêne DA, Duchêne S, Ho SYW 2017. New statistical criteria detect phylogenetic bias caused by compositional heterogeneity. Mol. Biol. Evol. 34:1529–34
    [Google Scholar]
  20. Duchêne DA, Duchêne S, Ho SYW 2018. PhyloMAd: efficient assessment of phylogenomic model adequacy. Bioinformatics 34:2300–1
    [Google Scholar]
  21. Duchêne DA, Duchêne S, Holmes EC, Ho SYW 2015. Evaluating the adequacy of molecular clock models using posterior predictive simulations. Mol. Biol. Evol. 32:2986–95
    [Google Scholar]
  22. Duchêne S, Duchêne DA, Di Giallonardo F, Eden J-S, Geoghegan JL et al. 2016. Cross-validation to select Bayesian hierarchical models in phylogenetics. BMC Evol. Biol. 16:115
    [Google Scholar]
  23. Dunn CW, Giribet G, Edgecombe GD, Hejnol A 2014. Animal phylogeny and its evolutionary implications. Annu. Rev. Ecol. Evol. Syst. 45:371–95
    [Google Scholar]
  24. Edwards SV, Liu L, Pearl DK 2007. High-resolution species trees without concatenation. PNAS 104:5936–41
    [Google Scholar]
  25. Efron B, Tibshirani RJ 1993. An Introduction to the Bootstrap Boca Raton, FL: Chapman & Hall
  26. Etienne RS, Rosindell J 2012. Prolonging the past counteracts the pull of the present: Protracted speciation can explain observed slowdowns in diversification. Syst. Biol. 61:204–13
    [Google Scholar]
  27. Felsenstein J. 1985. Phylogenies and the comparative method. Am. Nat. 125:1–15
    [Google Scholar]
  28. Felsenstein J. 2004. Inferring Phylogenies Sunderland, MA: Sinauer
  29. FitzJohn RG. 2010. Quantitative traits and diversification. Syst. Biol. 59:619–33
    [Google Scholar]
  30. FitzJohn RG. 2012. Diversitree: comparative phylogenetic analyses of diversification in R. Methods Ecol. Evol. 3:1084–92
    [Google Scholar]
  31. FitzJohn RG, Maddison WP, Otto SP 2009. Estimating trait-dependent speciation and extinction rates from incompletely resolved phylogenies. Syst. Biol. 58:595–611
    [Google Scholar]
  32. Foster PG. 2004. Modeling compositional heterogeneity. Syst. Biol. 53:485–95
    [Google Scholar]
  33. Foster PG, Hickey DA 1999. Compositional bias may affect both DNA-based and protein-based phylogenetic reconstructions. J. Mol. Evol. 48:284–90
    [Google Scholar]
  34. Garland T, Harvey PH, Ives AR 1992. Procedures for the analysis of comparative data using phylogenetically independent contrasts. Syst. Biol. 41:18–32
    [Google Scholar]
  35. Gelman A. 2003. A Bayesian formulation of exploratory data analysis and goodness-of-fit testing. Int. Stat. Rev. 71:369–82
    [Google Scholar]
  36. Gelman A, Carlin JB, Stern HS, Dunson DB, Vehtari A, Rubin DB 2013. Bayesian Data Analysis Boca Raton, FL: CRC Press, 3rd ed..
  37. Gelman A, Meng X-L, Stern H 1996. Posterior predictive assessment of model fitness via realized discrepancies. Stat. Sin. 6:733–807
    [Google Scholar]
  38. Goldberg EE, Lancaster LT, Ree RH 2011. Phylogenetic inference of reciprocal effects between geographic range evolution and diversification. Syst. Biol. 60:451–65
    [Google Scholar]
  39. Goldman N. 1993a. Statistical tests of models of DNA substitution. J. Mol. Evol. 36:182–98
    [Google Scholar]
  40. Goldman N. 1993b. Simple diagnostic statistical tests of models for DNA substitution. J. Mol. Evol. 37:650–61
    [Google Scholar]
  41. Gruenstaeudl M, Reid NM, Wheeler GL, Carstens BC 2015. Posterior predictive checks of coalescent models: P2C2M, an R package. Mol. Ecol. Res. 16:193–205
    [Google Scholar]
  42. Heard SB. 1992. Patterns in tree balance among cladistic, phenetic, and randomly generated phylogenetic trees. Evolution 46:1818–26
    [Google Scholar]
  43. Heard SB, Mooers 2002. Signatures of random and selective mass extinctions in phylogenetic tree balance. Syst. Biol. 51:889–97
    [Google Scholar]
  44. Heath TA, Moore BR 2014. Bayesian inference of species divergence times. Bayesian Phylogenetics: Methods, Algorithms, and Applications M-H Chen, L Kuo, PO Lewis 487–533 Sunderland, MA: Sinauer
    [Google Scholar]
  45. Heath TA, Zwickl DJ, Kim J, Hillis DM 2008. Taxon sampling affects inferences of macroevolutionary processes from phylogenetic trees. Syst. Biol. 57:160–66
    [Google Scholar]
  46. Heled J, Drummond AJ 2010. Bayesian inference of species trees from multilocus data. Mol. Biol. Evol. 27:570–80
    [Google Scholar]
  47. Ho SYW, Duchêne S 2014. Molecular-clock methods for estimating evolutionary rates and timescales. Mol. Ecol. 23:5947–65
    [Google Scholar]
  48. Höhna S, Coghill LM, Mount GG, Thomson RC, Brown JM 2017. P3: phylogenetic posterior prediction in RevBayes. Mol. Biol. Evol. 35:1028–34
    [Google Scholar]
  49. Höhna S, Landis MJ, Heath TA, Boussau B, Lartillot N et al. 2016. RevBayes: Bayesian phylogenetic inference using graphical models and an interactive model-specification language. Syst. Biol. 65:726–36
    [Google Scholar]
  50. Höhna S, May MR, Moore BR 2015. TESS: an R package for efficiently simulating phylogenetic trees and performing Bayesian inference of lineage diversification rates. Bioinformatics 32:789–91
    [Google Scholar]
  51. Huelsenbeck J. 1995. Performance of phylogenetic methods in simulation. Syst. Biol. 44:17–48
    [Google Scholar]
  52. Huelsenbeck JP, Larget B, Miller RE, Ronquist F 2002. Potential applications and pitfalls of Bayesian inference of phylogeny. Syst. Biol. 51:673–88
    [Google Scholar]
  53. Huelsenbeck J, Rannala B 2004. Frequentist properties of Bayesian posterior probabilities of phylogenetic trees under simple and complex substitution models. Syst. Biol. 53:904–13
    [Google Scholar]
  54. Huelsenbeck JP, Ronquist F, Nielsen R, Bollback JP 2001. Bayesian inference of phylogeny and its impact on evolutionary biology. Science 294:2310–14
    [Google Scholar]
  55. James G, Witten D, Hastie T, Tibshirani R 2013. An Introduction to Statistical Learning New York: Springer
  56. Joly S, McLenachan PA, Lockhart PJ 2009. A statistical approach for distinguishing hybridization and incomplete lineage sorting. Am. Nat. 174:E54–70
    [Google Scholar]
  57. Jukes TH, Cantor CR 1969. Evolution of protein molecules. Mammalian Protein Metabolism HN Munro 21–132 New York: Academic
    [Google Scholar]
  58. Kass RE. 2011. Statistical inference: the big picture. Stat. Sci. 26:1–9
    [Google Scholar]
  59. Kimura M. 1980. A simple method for estimating evolutionary rate of base substitutions through comparative studies of nucleotide sequences. J. Mol. Evol. 16:111–20
    [Google Scholar]
  60. Kishino H, Hasegawa M 1990. Converting distance to time: application to human evolution. Methods Enzymol 183:550–70
    [Google Scholar]
  61. Koch JM, Holder MT 2012. An algorithm for calculating the probability of classes of data patterns on a genealogy. PLOS Curr 4:e4fd1286980c08
    [Google Scholar]
  62. Lanfear R, Calcott B, Ho SYW, Guindon S 2012. PartitionFinder: combined selection of partitioning schemes and substitution models for phylogenetic analyses. Mol. Biol. Evol. 29:1695–701
    [Google Scholar]
  63. Lartillot N, Brinkmann H, Philippe H 2007. Suppression of long-branch attraction artefacts in the animal phylogeny using a site-heterogeneous model. BMC Evol. Biol. 7:Suppl. 1S4
    [Google Scholar]
  64. Lartillot N, Philippe H 2004. A Bayesian mixture model for across-site heterogeneities in the amino-acid replacement process. Mol. Biol. Evol. 21:1095–109
    [Google Scholar]
  65. Lartillot N, Philippe H 2008. Improvement of molecular phylogenetic inference and the phylogeny of Bilateria. Philos. Trans. R. Soc. B 363:1463–72
    [Google Scholar]
  66. Lemmon AR, Moriarty EC 2004. Importance of proper model assumption in Bayesian phylogenetics. Syst. Biol. 53:265–77
    [Google Scholar]
  67. Lewis PO, Xie W, Chen M-H, Fan Y, Kuo L 2014. Posterior predictive Bayesian phylogenetic model selection. Syst. Biol. 63:309–21
    [Google Scholar]
  68. Maddison WP. 1997. Gene trees in species trees. Syst. Biol. 46:523–36
    [Google Scholar]
  69. Maddison WP. 2006. Confounding asymmetries in evolutionary diversification and character change. Evolution 60:1743–46
    [Google Scholar]
  70. Maddison WP, Midford PE, Otto SP 2007. Estimating a binary character's effect on speciation and extinction. Syst. Biol. 56:701–10
    [Google Scholar]
  71. Mayr E. 1982. The Growth of Biological Thought: Diversity, Evolution, and Inheritance Cambridge, MA: Belknap
  72. McElreath R. 2016. Statistical Rethinking: A Bayesian Course with Examples in R and Stan Boca Raton. FL: CRC Press
  73. Meng X-L. 1994. Posterior predictive p-values. Ann. Stat. 22:1142–60
    [Google Scholar]
  74. Minin V, Abdo Z, Joyce P, Sullivan J 2003. Performance-based selection of likelihood models for phylogeny estimation. Syst. Biol. 52:674–83
    [Google Scholar]
  75. Mooers A. 1995. Tree balance and tree completeness. Evolution 49:379–84
    [Google Scholar]
  76. Navidi WC, Churchill GA, von Haeseler A 1991. Methods for inferring phylogenies from nucleic acid sequence data by using maximum likelihood and linear invariants. Mol. Biol. Evol. 8:128–43
    [Google Scholar]
  77. Nielsen R. 2002. Mapping mutations on phylogenies. Syst. Biol. 51:729–39
    [Google Scholar]
  78. Nielsen R, Huelsenbeck JP 2002. Detecting positively selected amino acid sites using posterior predictive p-values. Pac. Symp. Biocomput. 7:576–88
    [Google Scholar]
  79. Nylander JAA, Ronquist F, Huelsenbeck JP, Nieves-Aldrey JL 2004. Bayesian phylogenetic analysis of combined data. Syst. Biol. 53:47–67
    [Google Scholar]
  80. Pagel M, Meade A 2004. A phylogenetic mixture model for detecting pattern-heterogeneity in gene sequence or character-state data. Syst. Biol. 53:571–81
    [Google Scholar]
  81. Pennell MW, Fitzjohn RG, Cornwell WK, Harmon LJ 2015. Model adequacy and the macroevolution of angiosperm functional traits. Am. Nat. 186:E33–50
    [Google Scholar]
  82. Posada D, Buckley T 2004. Model selection and model averaging in phylogenetics: advantages of Akaike information criterion and Bayesian approaches over likelihood ratio tests. Syst. Biol. 53:793–808
    [Google Scholar]
  83. Posada D, Crandall K 2001. Selecting the best-fit model of nucleotide substitution. Syst. Biol. 50:580–601
    [Google Scholar]
  84. Rabosky DL, Goldberg EE 2015. Model inadequacy and mistaken inferences of trait-dependent speciation. Syst. Biol. 64:340–55
    [Google Scholar]
  85. Rabosky DL, Goldberg EE 2017. FiSSE: a simple nonparametric test for the effects of a binary character on lineage diversification rates. Evolution 71:1432–42
    [Google Scholar]
  86. Rannala B, Yang Z 2017. Efficient Bayesian species tree inference under the multispecies coalescent. Syst. Biol. 66:823–42
    [Google Scholar]
  87. Reeves JH. 1992. Heterogeneity in the substitution process of amino acid sites of proteins coded for by mitochondrial DNA. J. Mol. Evol. 35:17–31
    [Google Scholar]
  88. Reid NM, Hird SM, Brown JM, Pelletier TA, McVay JD et al. 2014. Poor fit to the multispecies coalescent is widely detectable in empirical data. Syst. Biol. 63:322–33
    [Google Scholar]
  89. Ren F, Tanaka H, Yang Z 2005. Empirical examination of the utility of codon-substitution models in phylogeny reconstruction. Syst. Biol. 54:808–18
    [Google Scholar]
  90. Richards EJ, Brown JM, Barley AJ, Chong RA, Thomson RC 2018. Variation across mitochondrial gene trees provides evidence for systematic error: How much gene tree variation is biological. Syst. Biol. 67:847–60
    [Google Scholar]
  91. Ripplinger J, Sullivan J 2010. Assessment of substitution model adequacy using frequentist and Bayesian methods. Mol. Biol. Evol. 27:2790–803
    [Google Scholar]
  92. Ritland K, Clegg MT 1987. Evolutionary analysis of plant DNA sequences. Am. Nat. 130:S74–100
    [Google Scholar]
  93. Rodrigue N, Philippe H, Lartillot N 2007. Assessing site-interdependent phylogenetic models of sequence evolution. Mol. Biol. Evol. 23:1762–75
    [Google Scholar]
  94. Ronquist F, Teslenko M, van der Mark P, Ayres DL, Darling A et al. 2012. MrBayes 3.2: efficient Bayesian phylogenetic inference and model choice across a large model space. Syst. Biol. 61:539–542
    [Google Scholar]
  95. Rubin DB. 1984. Bayesianly justifiable and relevant frequency calculations for the applied statistician. Ann. Stat. 12:1151–72
    [Google Scholar]
  96. Shapiro SS, Wilk MB 1965. An analysis of variance test for normality (complete samples). Biometrika 52:591–611
    [Google Scholar]
  97. Slater GJ, Pennell MW 2014. Robust regression and posterior predictive simulation increase power to detect early bursts of trait evolution. Syst. Biol. 63:293–308
    [Google Scholar]
  98. Stadler T, Degnan JH, Rosenberg NA 2016. Does gene tree discordance explain the mismatch between macroevolutionary models and empirical patterns of tree shape and branching times. Syst. Biol. 65:628–39
    [Google Scholar]
  99. Stigler SM. 2016. The Seven Pillars of Statistical Wisdom Cambridge, MA: Harvard Univ. Press
  100. Sullivan J, Joyce P 2005. Model selection in phylogenetics. Annu. Rev. Ecol. Evol. Syst. 36:445–66
    [Google Scholar]
  101. Tavaré S. 1986. Some probabilistic and statistical problems in the analysis of DNA sequences. Some Mathematical Questions in Biology: DNA Sequence Analysis Miura RM 57–86 Providence, RI: Am. Math. Soc.
    [Google Scholar]
  102. Waddell PJ, Ota R, Penny D 2009. Measuring fit of sequence data to phylogenetic model: gain of power using marginal tests. J. Mol. Evol. 69:289–99
    [Google Scholar]
  103. Whelan S, Goldman N 2001. A general empirical model of protein evolution derived from multiple protein families using a maximum-likelihood approach. Mol. Biol. Evol. 18:691–99
    [Google Scholar]
  104. Yang Z. 1994. Maximum likelihood phylogenetic estimation from DNA sequences with variable rates over sites: approximate methods. J. Mol. Evol. 39:306–14
    [Google Scholar]
  105. Yang Z. 2014. Molecular Evolution: A Statistical Approach Oxford: Oxford Univ. Press
  106. Yang Z, Nielsen R 2002. Codon-substitution models for detecting molecular adaptation at individual sites along specific lineages. Mol. Biol. Evol. 19:908–17
    [Google Scholar]
  107. Yang Z, Rannala B 2005. Branch-length prior influences Bayesian posterior probability of phylogeny. Syst. Biol. 54:455–70
    [Google Scholar]
  108. Yang Z, Rannala B 2010. Unguided species delimitation using DNA sequence data from multiple loci. Mol. Biol. Evol. 31:3125–35
    [Google Scholar]
  109. Yang Z, Rannala B 2014. Bayesian species delimitation using multilocus sequence data. PNAS 107:9264–69
    [Google Scholar]
  110. Zhou Y, Brinkmann H, Rodrigue N, Lartillot N, Philippe H 2010. A Dirichlet process covarion mixture model and its assessments using posterior predictive discrepancy tests. Mol. Biol. Evol. 27:371–84
    [Google Scholar]
  111. Zuckerkandl E, Pauling L 1962. Molecular disease, evolution and genetic heterogeneity. Horizons in Biochemistry M Kasha, B Pullman 189–225 New York: Academic
    [Google Scholar]
  112. Zuckerkandl E, Pauling L 1965. Evolutionary divergence and convergence in proteins. Evolving Genes and Proteins V Bryson, H Vogel 97–166 New York: Academic
    [Google Scholar]
/content/journals/10.1146/annurev-ecolsys-110617-062249
Loading
/content/journals/10.1146/annurev-ecolsys-110617-062249
Loading

Data & Media loading...

  • Article Type: Review Article
This is a required field
Please enter a valid email address
Approval was a Success
Invalid data
An Error Occurred
Approval was partially successful, following selected items could not be processed due to error