1932

Abstract

Machine learning is a field at the intersection of statistics and computer science that uses algorithms to extract information and knowledge from data. Its applications increasingly find their way into economics, political science, and sociology. We offer a brief introduction to this vast toolbox and illustrate its current uses in the social sciences, including distilling measures from new data sources, such as text and images; characterizing population heterogeneity; improving causal inference; and offering predictions to aid policy decisions and theory development. We argue that, in addition to serving similar purposes in sociology, machine learning tools can speak to long-standing questions on the limitations of the linear modeling framework, the criteria for evaluating empirical findings, transparency around the context of discovery, and the epistemological core of the discipline.

Loading

Article metrics loading...

/content/journals/10.1146/annurev-soc-073117-041106
2019-07-30
2024-05-26
Loading full text...

Full text loading...

/deliver/fulltext/soc/45/1/annurev-soc-073117-041106.html?itemId=/content/journals/10.1146/annurev-soc-073117-041106&mimeType=html&fmt=ahah

Literature Cited

  1. Abadie A, Kasy M 2017. The risk of machine learning. arXiv:1703.10935 [stat.ML]
    [Google Scholar]
  2. Abbott A 1995. Sequence analysis: new methods for old ideas. Annu. Rev. Sociol. 21:93–113
    [Google Scholar]
  3. Abbott A 2001. Time Matters: On Theory and Method Chicago: Univ. Chicago Press
  4. Abbott A, Tsay A 2000. Sequence analysis and optimal matching methods in sociology. Sociol. Methods Res. 29:3–33
    [Google Scholar]
  5. Abramitzky R, Mill R, Perez S 2019. Linking individuals across historical sources: a fully automated approach. Hist. Methods In press https://doi.org/10.1080/01615440.2018.1543034
    [Crossref] [Google Scholar]
  6. Airoldi EM, Blei DM, Fienberg SE, Xing EP 2008. Mixed membership stochastic blockmodels. J. Mach. Learn. Res. 9:1981–2014
    [Google Scholar]
  7. Angrist JD, Imbens GW, Rubin DB 1996. Identification of causal effects using instrumental variables. J. Am. Stat. Assoc. 91:444
    [Google Scholar]
  8. Athey S 2017. Beyond prediction: using big data for policy problems. Science 355:483–85
    [Google Scholar]
  9. Athey S, Imbens G 2015. A measure of robustness to misspecification. Am. Econ. Rev. 105:476–80
    [Google Scholar]
  10. Athey S, Imbens G 2016. Recursive partitioning for heterogeneous causal effects. PNAS 113:7353–60
    [Google Scholar]
  11. Athey S, Imbens GW 2017. The state of applied econometrics: causality and policy evaluation. J. Econ. Perspect. 31:3–32Overview of applied econometrics and the place of machine learning tools in the field.
    [Google Scholar]
  12. Bail CA 2008. The configuration of symbolic boundaries against immigrants in Europe. Am. Sociol. Rev. 73:37–59
    [Google Scholar]
  13. Bail CA 2014. The cultural environment: measuring culture with big data. Theor. Soc. 43:465–82
    [Google Scholar]
  14. Baldassarri D, Abascal M 2017. Field experiments across the social sciences. Annu. Rev. Sociol. 43:41–73
    [Google Scholar]
  15. Baldassarri D, Goldberg A 2014. Neither ideologues nor agnostics: alternative voters’ belief system in an age of partisan politics. Am. J. Sociol. 120:45–95
    [Google Scholar]
  16. Barocas S, Selbst A 2016. Big data's disparate impact. Calif. Law Rev. 104:671–732
    [Google Scholar]
  17. Baumer EPS, Mimno D, Guha S, Quan E, Gay GK 2017. Comparing grounded theory and topic modeling: extreme divergence or unlikely convergence. J. Assoc. Inf. Sci. Technol. 681397–410
    [Google Scholar]
  18. Beck N, King G, Zeng L 2000. Improving quantitative studies of international conflict: a conjecture. Am. Political Sci. Rev. 94:21–35
    [Google Scholar]
  19. Belloni A, Chen D, Chernozhukov V, Hanse C 2012. Sparse models and methods for optimal instruments with an application to eminent domain. Econometrica 80:2369–429
    [Google Scholar]
  20. Belloni A, Chernozhukov V, Fernandez-Val I, Hansen C 2017. Program evaluation and causal inference with high-dimensional data. Econometrica 85:233–98
    [Google Scholar]
  21. Belloni A, Chernozhukov V, Hansen C 2014. Inference on treatment effects after selection among high-dimensional controls. Rev. Econ. Stud. 81608–50Consideration of omitted variable bias in ML.
    [Google Scholar]
  22. Berk R 2012. Criminal Justice Forecasts of Risk New York: Springer
  23. Berk R, Heidari H, Jabbari S, Kearns M, Roth A 2018. Fairness in criminal justice risk assessments: the state of the art. Sociol. Methods Res. https://doi.org/10.1177/0049124118782533
    [Crossref] [Google Scholar]
  24. Berk RA, Sorenson SB, Barnes G 2016. Forecasting domestic violence: a machine learning approach to help inform arraignment decisions. J. Empir. Legal Stud. 13:94–115
    [Google Scholar]
  25. Bernheim BD, Bjorkegren D, Naecker J, Rangel A 2013. Non-choice evaluations predict behavioral responses to changes in economic conditions NBER Work. Pap 19269
  26. Billari FC, Fürnkranz J, Prskawetz A 2006. Timing, sequencing, and quantum of life course events: a machine learning approach. Eur. J. Popul. 22:37–65
    [Google Scholar]
  27. Blei DM 2012. Probabilistic topic models. Commun. ACM 55:77–84
    [Google Scholar]
  28. Blei DM, McAuliffe JD 2010. Supervised topic models. arXiv:1003.0783 [stat.ML]
  29. Blumenstock J, Cadamuro G, On R 2015. Predicting poverty and wealth from mobile phone metadata. Science 350:1073–76
    [Google Scholar]
  30. Bollen KA 2002. Latent variables in psychology and the social sciences. Annu. Rev. Psychol. 53:605–34
    [Google Scholar]
  31. Bonikowski B, DiMaggio P 2016. Varieties of American popular nationalism. Am. Sociol. Rev. 81:949–80
    [Google Scholar]
  32. Bonnefon JF, Shariff A, Rahwan I 2016. The social dilemma of autonomous vehicles. Science 3521573–6
    [Google Scholar]
  33. Brandt PT, Freeman JR, Schrodt PA 2011. Real time, time series forecasting of inter- and intra-state political conflict. Conflict Manag. Peace 28:41–64
    [Google Scholar]
  34. Breiger RL, Boorman SA, Arabie P 1975. An algorithm for clustering relational data with applications to social network analysis and comparison with multidimensional scaling. J. Math. Psychol. 12:328–83
    [Google Scholar]
  35. Breiman L 2001a. Random forests. Mach. Learn. 45:5–32
    [Google Scholar]
  36. Breiman L 2001b. Statistical modeling: the two cultures (with comments and a rejoinder by the author). Stat. Sci. 16:199–231
    [Google Scholar]
  37. Carrasco M 2012. A regularization approach to the many instruments problem. J. Econom. 170:383–98
    [Google Scholar]
  38. Cederman LE, Weidmann NB 2017. Predicting armed conflict: Time to adjust our expectations. Science 355474–76
    [Google Scholar]
  39. Chernozhukov V, Chetverikov D, Demirer M, Duflo E, Hansen C, Newey W 2017. Double/debiased/Neyman machine learning of treatment effects. Am. Econ. Rev. 107:261–65
    [Google Scholar]
  40. Chong W, Blei D, Li FF 2009. Simultaneous image classification and annotation. 2009 IEEE Conference on Computer Vision and Pattern Recognition1903–10 New York: IEEE
    [Google Scholar]
  41. Coleman J 1964. Introduction to Mathematical Sociology New York: Free Press
  42. Cornwell B 2015. Social Sequence Analysis: Methods and Applications Cambridge: UK: Cambridge Univ. Press
  43. Cranmer SJ, Desmarais BA 2017. What can we learn from predictive modeling. Political Anal. 25145–66
    [Google Scholar]
  44. Diamond A, Sekhon JS 2013. Genetic matching for estimating causal effects: a general multivariate matching method for achieving balance in observational studies. Rev. Econ. Stat. 95:932–45
    [Google Scholar]
  45. DiMaggio P, Nag M, Blei D 2013. Exploiting affinities between topic modeling and the sociological perspective on culture: application to newspaper coverage of U.S. government arts funding. Poetics 41:570–606
    [Google Scholar]
  46. Domingos P 2015. The Master Algorithm: How the Quest for the Ultimate Learning Machine Will Remake Our World New York: Basic Books
  47. Donaldson D, Storeygard A 2016. The view from above: applications of satellite data in economics. J. Econ. Perspect. 30:4171–98
    [Google Scholar]
  48. Donoho D 2017. 50 years of data science. J. Comput. Graph. Stat. 26:745–66
    [Google Scholar]
  49. Duncan OD 1982. Rasch measurement and sociological theory Hollingshead Lecture presented at Yale University New Haven, CT: April 20
  50. Dwork C, Hardt M, Pitassi T, Reingold O, Zemel R 2012. Fairness through awareness. Proceedings of the 3rd Innovations in Theoretical Computer Science Conference S Goldwasser214–26 New York: ACM
    [Google Scholar]
  51. Evans JA, Aceves P 2016. Machine translation: mining text for social theory. Annu. Rev. Sociol. 42:21–50
    [Google Scholar]
  52. Farhangfar A, Kurgan L, Dy J 2008. Impact of imputation of missing values on classification error for discrete data. Pattern Recognit. 41:3692–705
    [Google Scholar]
  53. Feigenbaum JJ 2015. Automated census record linking: a machine learning approachWork. Pap., Harvard Univ., Cambridge, MA. https://scholar.harvard.edu/jfeigenbaum/publications/automated-census-record-linking
  54. Fortunato S 2010. Community detection in graphs. Phys. Rep. 486:75–174
    [Google Scholar]
  55. Fortunato S, Hric D 2016. Community detection in networks: a user guide. Phys. Rep. 659:1–44
    [Google Scholar]
  56. Freese J 2007. Replication standards for quantitative social science. Sociol. Methods Res. 36:153–72
    [Google Scholar]
  57. Frye M, Trinitapoli J 2015. Ideals as anchors for relationship experiences. Am. Sociol. Rev. 80:496–525
    [Google Scholar]
  58. Garip F 2012. Discovering diverse mechanisms of migration: the Mexico–US stream. 1970–2000. Popul. Dev. Rev. 38:393–433
    [Google Scholar]
  59. Garip F 2016.On the Move: Changing Mechanisms of Mexico–U.S. Migration. Princeton, NJ: Princeton Univ. Press
  60. Gelman A, Hill J 2007.Data Analysis Using Regression and Multilevel/Hierarchical Models. Cambridge, UK: Cambridge Univ. Press
  61. Girvan M, Newman MEJ 2002. Community structure in social and biological networks. PNAS 997821–6
    [Google Scholar]
  62. Glaeser EL, Hillis A, Kominers SD, Luca M 2016. Crowdsourcing city government: using tournaments to improve inspection accuracy. Am. Econ. Rev. 106:114–18
    [Google Scholar]
  63. Goldberg A 2011. Mapping shared understandings using relational class analysis: the case of the cultural omnivore reexamined. Am. J. Sociol. 116:1397–436
    [Google Scholar]
  64. Goodfellow I, Bengio Y, Courville A 2016. Deep Learning Cambridge, MA: MIT PressBasic introduction to machine learning (with emphasis on deep learning toolbox).
  65. Greene JD 2016. Our driverless dilemma. Science 3521514–5
    [Google Scholar]
  66. Grimmer J, King G 2011. General purpose computer-assisted clustering and conceptualization. PNAS 1082643–50
    [Google Scholar]
  67. Grimmer J, Messing S, Westwood SJ 2017. Estimating heterogeneous treatment effects and the effects of heterogeneous treatments with ensemble methods. Political Anal. 25:413–34
    [Google Scholar]
  68. Grimmer J, Stewart BM 2013. Text as data: the promise and pitfalls of automatic content analysis methods for political texts. Political Anal. 21:267–97
    [Google Scholar]
  69. Grosse R 2013. Predictive learning vs. representation learning. Laboratory for Intelligent Probabilistic Systems Blog4 https://lips.cs.princeton.edu/predictive-learning-vs-representation-learning/
    [Google Scholar]
  70. Handcock MS, Raftery AE, Tantrum JM 2007. Model-based clustering for social networks. J. R. Stat. Soc. 170:301–54
    [Google Scholar]
  71. Harcourt BE 2007. Against Prediction: Profiling, Policing, and Punishing in an Actuarial Age Chicago: Univ. Chicago Press
  72. Hardt M, Price E, Srebro N 2016. Equality of opportunity in supervised learning. Proceedings of the 30th International Conference on Neural Information Processing Systems DD Lee, U von Luxburg, R Garnett, M Sugiyama, I Guyon3323–31 Red Hook, NY: Curran
    [Google Scholar]
  73. Hartford J, Lewis G, Leyton-Brown K, Taddy M 2016. Counterfactual prediction with deep instrumental variables networks. arXiv1612.09596 [stat.AP]
    [Google Scholar]
  74. Hastie T, Tibshirani R, Friedman J 2009. The Elements of Statistical Learning: Data Mining, Inference, and Prediction New York: Springer. 2nd ed.Overview of applied econometrics, the place of machine learning tools in the field.
  75. Hill JL 2011. Bayesian nonparametric modeling for causal inference. J. Comput. Graph. Stat. 20:217–40
    [Google Scholar]
  76. Ho DE, Imai K, King G, Stuart EA 2007. Matching as nonparametric preprocessing for reducing model dependence in parametric causal inference. Political Anal. 15:199–236
    [Google Scholar]
  77. Hoff PD, Raftery AE, Handcock MS 2002. Latent space approaches to social network analysis. J. Am. Stat. Assoc. 97:1090–98
    [Google Scholar]
  78. Hofman JM, Sharma A, Watts DJ 2017. Prediction and explanation in social systems. Science 355:486–88Explains importance of prediction in social sciences.
    [Google Scholar]
  79. Hopkins DJ, King G 2010. A method of automated nonparametric content analysis for social science. Am. J. Political Sci. 54:229–47
    [Google Scholar]
  80. Imai K, Ratkovic M 2013. Estimating treatment effect heterogeneity in randomized program evaluation. Ann. Appl. Stat. 7:443–70
    [Google Scholar]
  81. Imbens GW, Rubin DB 2015. Causal Inference for Statistics, Social, and Biomedical Sciences: An Introduction Cambridge: UK: Cambridge Univ. Press
  82. Ioannidis J, Doucouliagos C 2013. What's to know about the credibility of empirical economics. J. Econ. Surv. 27997–1004
    [Google Scholar]
  83. Jordan MI, Mitchell TM 2015. Machine learning: trends, perspectives, and prospects. Science 349:255–60
    [Google Scholar]
  84. Killewald A, Zhuo X 2018. U.S. mothers’ long-term employment patterns. Demography 56:285–320
    [Google Scholar]
  85. King G, Nielsen R 2019. Why propensity scores should not be used for matching. Political Anal. In press
    [Google Scholar]
  86. Kleinberg J, Liang A, Mullainathan S 2017. The theory is predictive, but is it complete? An application to human perception of randomness. arXiv1706.06974 [cs.LG]
    [Google Scholar]
  87. Kleinberg J, Ludwig J, Mullainathan S, Obermeyer Z 2015. Prediction policy problems. Am. Econ. Rev. 105:491–95Uses machine learning predictions to understand human decision-making.
    [Google Scholar]
  88. Kleinberg J, Mullainathan S, Raghavan M 2016. Inherent trade-offs in the fair determination of risk scores. arXiv1609.05807 [cs.LG]
    [Google Scholar]
  89. Knight K, Fu W 2000. Asymptotics for lasso-type estimators. Ann. Stat. 28:1356–78
    [Google Scholar]
  90. Lee BK, Lessler J, Stuart EA 2010. Improving propensity score weighting using machine learning. Stat. Med. 29:337–46
    [Google Scholar]
  91. Lieberson S, Lynn FB 2002. Barking up the wrong branch: scientific alternatives to the current model of sociological science. Annu. Rev. Sociol. 28:1–19
    [Google Scholar]
  92. Matias C, Miele V 2017. Statistical clustering of temporal networks through a dynamic stochastic block model. Stat. Methodol. 79:1119–41
    [Google Scholar]
  93. McCaffrey DF, Ridgeway G, Morral AR 2004. Propensity score estimation with boosted regression for evaluating causal effects in observational studies. Psychol. Methods 9:4403–25
    [Google Scholar]
  94. McFarland DA, Ramage D, Chuang J, Heer J, Manning CD, Jurafsky D 2013. Differentiating language usage through topic models. Poetics 41:607–25
    [Google Scholar]
  95. Mohr JW, Bogdanov P 2013. Introduction—topic models: what they are and why they matter. Poetics 41:545–69
    [Google Scholar]
  96. Mohr JW, Wagner-Pacifici R, Breiger RL, Bogdanov P 2013. Graphing the grammar of motives in national security strategies: cultural interpretation, automated text analysis and the drama of global politics. Poetics 41:670–700
    [Google Scholar]
  97. Morgan SL, Winship C 2007. Counterfactuals and Causal Inference: Methods and Principles for Social Research Cambridge, UK: Cambridge Univ. Press 1st ed.
  98. Morgan SL, Winship C 2014. Counterfactuals and Causal Inference: Methods and Principles for Social Research Cambridge, UK: Cambridge Univ. Press 2nd ed.
  99. Mullainathan S, Spiess J 2017. Machine learning: an applied econometric approach. J. Econ. Perspect. 31:87–106Introduction to machine learning methods to economists in relation to econometric toolbox.
    [Google Scholar]
  100. Muller M, Guha S, Baumer EP, Mimno D, Shami NS 2016. Machine learning and grounded theory method: convergence, divergence, and combination. Proceedings of the 19th International Conference on Supporting Group Work3–8 New York: ACM
    [Google Scholar]
  101. Muñoz J, Young C 2018. We ran 9 billion regressions: eliminating false positives through computational model robustness. Sociol. Methodol. 481–33
    [Google Scholar]
  102. Narayanan A 2018. Tutorial: 21 fairness definitions and their politics. YouTube https://www.youtube.com/watch?v=jIXIuYdnyyk
    [Google Scholar]
  103. Nowicki K, Snijders TAB 2001. Estimation and prediction for stochastic blockstructures. J. Am. Stat. Assoc. 96:1077–87
    [Google Scholar]
  104. Olson RS, La Cava W, Mustahsan Z, Varik A, Moore JH 2018. Data-driven advice for applying machine learning to bioinformatics problems. Pac. Symp. Biocomput. 23192–203
    [Google Scholar]
  105. Open Sci. Collab 2015. Estimating the reproducibility of psychological science. Science 349:aac4716
    [Google Scholar]
  106. Pearl J, Mackenzie D 2018. The Book of Why: The New Science of Cause and Effect New York: Basic Books
  107. Perry C 2013. Machine learning and conflict prediction: a use case. Stab. Int. J. Secur. Dev. 2:356
    [Google Scholar]
  108. Peters J, Janzing D, Scholkopf B 2017. Elements of causal inference: foundations and learning algorithms. Cambridge: MA: MIT Press
    [Google Scholar]
  109. Popper K 1935. Logik der Forschung Vienna: Julius Springer
  110. Raftery AE 1995. Bayesian model selection in social research. Sociol. Methodol. 25:111–63
    [Google Scholar]
  111. Ragin C 1987. The Comparative Method: Moving Beyond Qualitative and Quantitative Strategies Berkeley: Univ. Calif. Press
  112. Setoguchi S, Schneeweiss S, Brookhart MA, Glynn RJ, Cook EF 2008. Evaluating uses of data mining techniques in propensity score estimation: a simulation study. Pharmacoepidemiol. Drug Saf. 17546–55
    [Google Scholar]
  113. Simmons JP, Nelson LD, Simonsohn U 2011. False-positive psychology: Undisclosed flexibility in data collection and analysis allows presenting anything as significant. Psychol. Sci. 22:1359–66
    [Google Scholar]
  114. Sovilj D, Eirola E, Miche Y, Björk KM, Nian R et al. 2016. Extreme learning machine for missing data using multiple imputations. Neurocomputing 174:220–31
    [Google Scholar]
  115. Starr SB 2014. Evidence-based sentencing and the scientific rationalization of discrimination. Stanford Law Rev. 66:803
    [Google Scholar]
  116. Swedberg R 2014. The Art of Social Theory Princeton: NJ: Princeton Univ. Press
  117. Tibshirani R 1996. Regression shrinkage and selection via the lasso. J. R. Stat. Soc. B 58:267–88
    [Google Scholar]
  118. Varian HR. 2014. Big data: new tricks for econometrics. J. Econ. Perspect. 28:3–27Advice on software to analyze big data, to apply machine learning to social science questions.
    [Google Scholar]
  119. Varma S, Simon R 2006. Bias in error estimation when using cross-validation for model selection. BMC Bioinformatics 7:91
    [Google Scholar]
  120. Wager S, Athey S 2018. Estimation and inference of heterogeneous treatment effects using random forests. J. Am. Stat. Assoc. 113:1228–42
    [Google Scholar]
  121. Watts DJ 2004. The new science of networks. Annu. Rev. Sociol. 30:243–70
    [Google Scholar]
  122. Watts DJ 2014. Common sense and sociological explanations. Am. J. Sociol. 120:313–51
    [Google Scholar]
  123. Weber M 1978. Economy and Society Berkeley: Univ. Calif. Press
  124. Western B 1996. Vague theory and model uncertainty in macrosociology. Sociol. Methodol. 26:165–92
    [Google Scholar]
  125. Westreich D, Lessler J, Funk MJ 2010. Propensity score estimation: neural networks, support vector machines, decision trees (CART), and meta-classifiers as alternatives to logistic regression. J. Clin. Epidemiol. 63826–33
    [Google Scholar]
  126. White HC, Boorman SA, Breiger RL 1976. Social structure from multiple networks. I. Blockmodels of roles and positions. Am. J. Sociol. 81:730–80
    [Google Scholar]
  127. Wolpert D, Macready W 1997. No free lunch theorems for optimization. IEEE Trans. Evol. Comput. 1:67–82
    [Google Scholar]
  128. Wong SC, Gatt A, Stamatescu V, McDonnell MD 2016. Understanding data augmentation for classification: when to warp. 2016 International Conference on Digital Image Computing: Techniques and Applications (DICTA) AW Liew, B Lovell, C Fookes, J Zhou, Y Gao1–6 New York: IEEE
    [Google Scholar]
  129. Wyss R, Ellis AR, Brookhart MA, Girman CJ, Jonsson Funk M et al. 2014. The role of prediction modeling in propensity score estimation: an evaluation of logistic regression, bCART, and the covariate-balancing propensity score. Am. J. Epidemiol. 180:645–55
    [Google Scholar]
  130. Xie Y 2007. Otis Dudley Duncan's legacy: the demographic approach to quantitative reasoning in social science. Res. Soc. Strat. Mobil. 25:141–56
    [Google Scholar]
  131. Xie Y 2013. Population heterogeneity and causal inference. PNAS 1106262–8
    [Google Scholar]
  132. Xing EP, Fu W, Song L 2010. A state-space mixed membership blockmodel for dynamic network tomography. Ann. Appl. Stat. 4:535–66
    [Google Scholar]
  133. Yang T, Chi Y, Zhu S, Gong Y, Jin R 2011. Detecting communities and their evolutions in dynamic social networks—a Bayesian approach. Mach. Learn. 82:157–89
    [Google Scholar]
  134. Yarkoni T, Westfall J 2017. Choosing prediction over explanation in psychology: lessons from machine learning. Perspect. Psychol. Sci. 12:1100–22Explains relevance of machine learning to psychological research, to preventing p-hacking.
    [Google Scholar]
  135. Young C 2009. Model uncertainty in sociological research: an application to religion and economic growth. Am. Sociol. Rev. 74:380–97
    [Google Scholar]
  136. Young C, Holsteen K 2017. Model uncertainty and robustness. a computational framework for multimodel analysis. Sociol. Methods Res. 46:3–40
    [Google Scholar]
/content/journals/10.1146/annurev-soc-073117-041106
Loading
  • Article Type: Review Article
This is a required field
Please enter a valid email address
Approval was a Success
Invalid data
An Error Occurred
Approval was partially successful, following selected items could not be processed due to error