1932

Abstract

One of the most important tasks in sports analytics is the development of binary response models for head-to-head game outcomes to estimate team and player strength. We discuss commonly used probability models for game outcomes, including the Bradley–Terry and Thurstone–Mosteller models, as well as extensions to ties as a third outcome and to the inclusion of a home-field advantage. We consider dynamic extensions to these models to account for the evolution of competitor strengths over time. Full likelihood-based analyses of these time-varying models can be simplified into rating systems, such as the Elo and Glicko rating systems. We present other modern rating systems, including popular methods for online gaming, and novel systems that have been implemented for online chess and Go. The discussion of the analytic methods are accompanied by examples of where these approaches have been implemented for various gaming organizations, as well as a detailed application to National Basketball Association game outcomes.

Loading

Article metrics loading...

/content/journals/10.1146/annurev-statistics-040722-061813
2025-03-07
2025-04-19
Loading full text...

Full text loading...

/deliver/fulltext/statistics/12/1/annurev-statistics-040722-061813.html?itemId=/content/journals/10.1146/annurev-statistics-040722-061813&mimeType=html&fmt=ahah

Literature Cited

  1. ACF (Aust. Chess Fed.). 2013.. Ratings by-law. By-Law Document, Aust. Chess Fed., Frankston South, Vic., Aust:. https://auschess.org.au/wp-content/uploads/2018/09/ACF-Ratings-By-Law.pdf
    [Google Scholar]
  2. Age of Empires DE Team. 2021.. Updates to ranked team game ELO acquisition. . Age of Empires News Blog, May 17. https://www.ageofempires.com/news/rankedtg-update-5-2021
    [Google Scholar]
  3. Allebest E. 2018.. Chess ratings – how they work. . Chess.com Help, April 9. https://www.chess.com/article/view/chess-ratings---how-they-work
    [Google Scholar]
  4. Araki K, Hirose Y, Komaki F. 2019.. Paired comparison models with age effects modeled as piecewise quadratic splines. . Int. J. Forecast. 35:(2):73340
    [Crossref] [Google Scholar]
  5. Baker RD, McHale IG. 2014.. A dynamic paired comparisons model: Who is the greatest tennis player?. Eur. J. Operat. Res. 236:(2):67784
    [Crossref] [Google Scholar]
  6. Baker RD, McHale IG. 2015.. Deterministic evolution of strength in multiple comparisons models: Who is the greatest golfer?. Scand. J. Stat. 42:(1):18096
    [Crossref] [Google Scholar]
  7. Baker RD, McHale IG. 2017.. An empirical Bayes model for time-varying paired comparisons ratings: Who is the greatest women's tennis player?. Eur. J. Operat. Res. 258:(1):32833
    [Crossref] [Google Scholar]
  8. Berrut JP, Floater MS, Klein G. 2011.. Convergence rates of derivatives of a family of barycentric rational interpolants. . Appl. Numer. Math. 61:(9):9891000
    [Crossref] [Google Scholar]
  9. Bong H, Li W, Shrotriya S, Rinaldo A. 2020.. Nonparametric estimation in the dynamic Bradley-Terry model. . Proc. Mach. Learn. Res. 108::331726
    [Google Scholar]
  10. Bradley RA, Terry ME. 1952.. Rank analysis of incomplete block designs: I. The method of paired comparisons. . Biometrika 39::32445
    [Google Scholar]
  11. Çakr G, Taifalos N, Davie C. 2024.. Dota 2 ranks, MMR, and ranking system explained. . Dota 2 Blog, July 25. https://dotesports.com/dota-2/news/dota-2-mmr-and-ranking-system-explained
    [Google Scholar]
  12. Cattelan M. 2012.. Models for paired comparison data: a review with emphasis on dependent data. . Stat. Sci. 27:(3):41233
    [Crossref] [Google Scholar]
  13. Cattelan M, Varin C, Firth D. 2013.. Dynamic Bradley-Terry modelling of sports tournaments. . J. R. Stat. Soc. Ser. C 62:(1):13550
    [Crossref] [Google Scholar]
  14. David H. 1988.. The Method of Paired Comparisons. London:: Charles Griffin & Co.
    [Google Scholar]
  15. Davidson RR. 1970.. On extending the Bradley-Terry model to accommodate ties in paired comparison experiments. . J. Am. Stat. Assoc. 65:(329):31728
    [Crossref] [Google Scholar]
  16. Davidson RR, Beaver RJ. 1977.. On extending the Bradley-Terry model to incorporate within-pair order effects. . Biometrics 33::693702
    [Crossref] [Google Scholar]
  17. Dewart N, Gillard J. 2019.. Using Bradley-Terry models to analyze test match cricket. . IMA J. Manag. Math. 30:(2):187207
    [Google Scholar]
  18. Dixon MJ, Coles SG. 1997.. Modelling association football scores and inefficiencies in the football betting market. . J. R. Stat. Soc. Ser. C 46:(2):26580
    [Crossref] [Google Scholar]
  19. Donoho DL, Maleki A, Montanari A. 2009.. Message-passing algorithms for compressed sensing. . PNAS 106:(45):1891419
    [Crossref] [Google Scholar]
  20. Durbin J, Koopman SJ. 2012.. Time Series Analysis by State Space Methods. Oxford, UK:: Oxford Univ. Press
    [Google Scholar]
  21. Elo AE. 1978.. The Rating of Chess Players, Past and Present. New York:: Arco
    [Google Scholar]
  22. Fahrmeir L, Tutz G. 1994.. Dynamic stochastic models for time-dependent ordered paired comparison systems. . J. Am. Stat. Assoc. 89:(428):143849
    [Crossref] [Google Scholar]
  23. FICS (Free Internet Chess Serv.). 2008.. Vek-splanation of the Glicko ratings system. . FICS Help File. https://www.freechess.org/Help/HelpFiles/glicko.html
    [Google Scholar]
  24. FIFA (Féd. Int. Footb. Assoc.). 2023a.. Men's ranking procedures. Fact Sheet, FIFA, Zurich, Switz:. https://www.fifa.com/fifa-world-ranking/procedure-men
    [Google Scholar]
  25. FIFA (Féd. Int. Footb. Assoc.). 2023b.. . Women's ranking procedures. Fact Sheet, FIFA, Zurich, Switz:. https://www.fifa.com/fifa-world-ranking/procedure-women
    [Google Scholar]
  26. Floater MS, Hormann K. 2007.. Barycentric rational interpolation with no poles and high rates of approximation. . Numer. Math. 107::31531
    [Crossref] [Google Scholar]
  27. GGPoker. 2023.. Spin & Gold ELO. . Spin & Gold. https://ggpoker.ca/poker-games/spin-gold-elo/
    [Google Scholar]
  28. Glenn WA, David HA. 1960.. Ties in paired-comparison experiments using a modified Thurstone-Mosteller model. . Biometrics 16:(1):86109
    [Crossref] [Google Scholar]
  29. Glickman ME. 1993.. Paired comparison models with time-varying parameters. PhD Thesis , Dep. Stat., Harvard Univ., Cambridge, MA:
    [Google Scholar]
  30. Glickman ME. 1995.. A comprehensive guide to chess ratings. . Am. Chess J. 3:(1):59102
    [Google Scholar]
  31. Glickman ME. 1999.. Parameter estimation in large dynamic paired comparison experiments. . J. R. Stat. Soc. Ser. C 48:(3):37794
    [Crossref] [Google Scholar]
  32. Glickman ME. 2001.. Dynamic paired comparison models with stochastic variances. . J. Appl. Stat. 28:(6):67389
    [Crossref] [Google Scholar]
  33. Glickman ME. 2022.. Welcome to Glicko ratings. . Mark Glickman's World. http://www.glicko.net/glicko.html
    [Google Scholar]
  34. Glickman ME, Doan T. 2024.. The USCF rating system. Tech. Doc., US Chess Fed., St. Louis, MO:. https://new.uschess.org/sites/default/files/media/documents/us_chess_rating_system_specs-2024-03-01.pdf
    [Google Scholar]
  35. Glickman ME, Stern HS. 2017.. Estimating team strength in the NFL. . In Handbook of Statistical Methods and Analyses in Sports, ed. J Albert, ME Glickman, TB Swartz, RH Koning , pp. 11336. Boca Raton, FL:: Chapman and Hall/CRC
    [Google Scholar]
  36. Gneiting T, Raftery AE. 2007.. Strictly proper scoring rules, prediction, and estimation. . J. Am. Stat. Assoc. 102:(477):35978
    [Crossref] [Google Scholar]
  37. Good IJ. 1952.. Rational decisions. . J. R. Stat. Soc. Ser. B 14:(1):10714
    [Crossref] [Google Scholar]
  38. Gorgi P, Koopman SJ, Lit R. 2019.. The analysis and forecasting of tennis matches by using a high dimensional dynamic model. . J. R. Stat. Soc. Ser. A 182:(4):1393409
    [Crossref] [Google Scholar]
  39. Grand Chess Tour. 2021.. Universal Rating System™. . Universal Rating System. http://universalrating.com/
    [Google Scholar]
  40. Herbrich R, Minka T, Graepel T. 2006.. TrueSkill™: a Bayesian skill rating system. . In NIPS'06: Proceedings of the 19th International Conference on Neural Information Processing Systems, ed. B Schölkopf, JC Platt, T Hoffman , pp. 56976. Cambridge, MA:: MIT Press
    [Google Scholar]
  41. Ingram M. 2021.. How to extend Elo: A Bayesian perspective. . J. Quant. Anal. Sports 17:(3):20319
    [Crossref] [Google Scholar]
  42. Keith T. 2019.. Backgammon FAQ: ratings. . Backgammon Galore. https://www.bkgm.com/faq/Ratings.html
    [Google Scholar]
  43. Knorr-Held L. 2000.. Dynamic rating of sports teams. . J. R. Stat. Soc. Ser. D 49:(2):26176
    [Google Scholar]
  44. Koehler KJ, Ridpath H. 1982.. An application of a biased version of the Bradley-Terry-Luce model to professional basketball results. . J. Math. Psychol. 25:(3):187205
    [Crossref] [Google Scholar]
  45. Krese B, Štrumbelj E. 2021.. A Bayesian approach to time-varying latent strengths in pairwise comparisons. . PLOS ONE 16:(5):e0251945
    [Crossref] [Google Scholar]
  46. League of Legends Wiki. 2022.. Elo rating system. . League of Legends Wiki. https://leagueoflegends.fandom.com/wiki/Elo_rating_system
    [Google Scholar]
  47. Lichess.org. 2023.. Frequently asked questions. . Lichess.org. https://lichess.org/faq#ratings
    [Google Scholar]
  48. Luce RD. 1959.. Individual Choice Behavior: A Theoretical Analysis. John Wiley and Sons
    [Google Scholar]
  49. McHale I, Morton A. 2011.. A Bradley-Terry type model for forecasting tennis match results. . Int. J. Forecast. 27:(2):61930
    [Crossref] [Google Scholar]
  50. Medado D. 2021.. CS:GO ranks explained 2021 – how ranking system works, tips for good rank, complete guide. . AFK Gaming. https://afkgaming.com/csgo/guide/6825-csgo-ranks-explained-2021-how-ranking-system-works-tips-for-good-rank-complete-guide
    [Google Scholar]
  51. Microsoft. 2023.. TrueSkill™ ranking system. . Microsoft Research. https://www.microsoft.com/en-us/research/project/trueskill-ranking-system/
    [Google Scholar]
  52. Minka T, Cleven R, Zaykov Y. 2018.. TrueSkill 2: an improved Bayesian skill rating system. Tech. Rep. MSR-TR-2018-8 , Microsoft, Redmond, WA:
    [Google Scholar]
  53. Minka TP. 2001.. A family of algorithms for approximate Bayesian inference. PhD Thesis , MIT, Cambridge, MA:
    [Google Scholar]
  54. Morgulev E, Azar OH, Lidor R. 2018.. Sports analytics and the big-data era. . Int. J. Data Sci. Anal. 5::21322
    [Crossref] [Google Scholar]
  55. Mosteller F. 1951.. Remarks on the method of paired comparisons: I. The least squares solution assuming equal standard deviations and equal correlations. . Psychometrika 16:(1):39
    [Crossref] [Google Scholar]
  56. Noek A. 2017.. OGS has a new Glicko-2 based rating system!. Online Go. https://forums.online-go.com/t/ogs-has-a-new-glicko-2-based-rating-system-
    [Google Scholar]
  57. Phelan GC, Whelan JT. 2018.. Hierarchical Bayesian Bradley-Terry for applications in Major League Baseball. . Math. Appl 7::7184
    [Crossref] [Google Scholar]
  58. Plackett RL. 1975.. The analysis of permutations. . Appl. Stat. 24:(2):193202
    [Crossref] [Google Scholar]
  59. Rao PV, Kupper LL. 1967.. Ties in paired-comparison experiments: a generalization of the Bradley-Terry model. . J. Am. Stat. Assoc. 62:(317):194204
    [Crossref] [Google Scholar]
  60. Sadasivan G. 1983.. Within-pair order effects in paired comparisons. . Stud. Sci. Math. Hung. 18::22938
    [Google Scholar]
  61. Scarf P, Rangel JS Jr. 2017.. Models for outcomes of soccer matches. . In Handbook of Statistical Methods and Analyses in Sports, ed. J Albert, ME Glickman, TB Swartz, RH Koning , pp. 35770. Boca Raton, FL:: Chapman and Hall/CRC
    [Google Scholar]
  62. Shubert B. 2006.. Rating system math. . KGS Go Server. https://www.gokgs.com/help/rmath.html
    [Google Scholar]
  63. Spanias D, Knottenbelt WJ. 2013.. Predicting the outcomes of tennis matches using a low-level point model. . IMA J. Manag. Math. 24:(3):31120
    [Google Scholar]
  64. Stephenson A, Sonas J. 2020.. PlayerRatings: dynamic updating methods for player ratings estimation. . R Package, version 1.1-0. https://CRAN.R-project.org/package=PlayerRatings
    [Google Scholar]
  65. Szymanski S. 2020.. Sport analytics: science or alchemy?. Kinesiol. Rev. 9::5763
    [Crossref] [Google Scholar]
  66. Taylor WJ. 1945.. Method of Lagrangian curvilinear interpolation. . J. Res. Natl. Bureau Stand. 35:(2):15155
    [Crossref] [Google Scholar]
  67. Thurstone LL. 1927.. A law of comparative judgment. . Psychol. Rev. 34::27386
    [Crossref] [Google Scholar]
  68. Tsokos A, Narayanan S, Kosmidis I, Baio G, Cucuringu M, et al. 2019.. Modeling outcomes of soccer matches. . Mach. Learn. 108::7795
    [Crossref] [Google Scholar]
  69. USA Pickleball. 2018.. How does the ELO system work?. USA Pickleball. https://usapickleball.org/ufaqs/how-does-the-elo-system-work
    [Google Scholar]
  70. Veček N, Črepinšek M, Mernik M, Hrnčič D. 2014.. A comparison between different chess rating systems for ranking evolutionary algorithms. . In 2014 Federated Conference on Computer Science and Information Systems, pp. 51118. Piscataway, NJ:: IEEE
    [Google Scholar]
  71. Watanabe NM, Shapiro S, Drayer J. 2021.. Big data and analytics in sport management. . J. Sport Manag. 35:(3):197202
    [Crossref] [Google Scholar]
  72. West M, Harrison PJ, Migon HS. 1985.. Dynamic generalized linear models and Bayesian forecasting. . J. Am. Stat. Assoc. 80:(389):7383
    [Crossref] [Google Scholar]
  73. Whelan JT, Klein JE. 2021.. Bradley-Terry modeling with multiple game outcomes with applications to college hockey. . arXiv:2112.01267 [stat.AP]
  74. Yue JC, Chou EP, Hsieh MH, Hsiao LC. 2022.. A study of forecasting tennis matches via the Glicko model. . PLOS ONE 17:(4):e0266838
    [Crossref] [Google Scholar]
  75. Zermelo E. 1928.. The calculations of the results of a tournament as a maximum problem in the calculus of probabilities. . Math. Z. 29::43660
    [Crossref] [Google Scholar]
/content/journals/10.1146/annurev-statistics-040722-061813
Loading
/content/journals/10.1146/annurev-statistics-040722-061813
Loading

Data & Media loading...

Supplemental Materials

  • Article Type: Review Article
This is a required field
Please enter a valid email address
Approval was a Success
Invalid data
An Error Occurred
Approval was partially successful, following selected items could not be processed due to error