Machine Learning Methods That Economists Should Know About

Susan Athey; Guido W. Imbens

doi:10.1146/annurev-economics-080217-053433

Machine Learning Methods That Economists Should Know About

Susan Athey^1,2,3, and Guido W. Imbens^1,2,3,4
View Affiliations Hide Affiliations

Affiliations: ¹Graduate School of Business, Stanford University, Stanford, California 94305, USA; email: [email protected][email protected] ²Stanford Institute for Economic Policy Research, Stanford University, Stanford, California 94305, USA ³National Bureau of Economic Research, Cambridge, Massachusetts 02138, USA ⁴Department of Economics, Stanford University, Stanford, California 94305, USA
Vol. 11:685-725 (Volume publication date August 2019) https://doi.org/10.1146/annurev-economics-080217-053433
First published as a Review in Advance on June 10, 2019
Copyright © 2019 by Annual Reviews. All rights reserved

Abstract

We discuss the relevance of the recent machine learning (ML) literature for economics and econometrics. First we discuss the differences in goals, methods, and settings between the ML literature and the traditional econometrics and statistics literatures. Then we discuss some specific methods from the ML literature that we view as important for empirical researchers in economics. These include supervised learning methods for regression and classification, unsupervised learning methods, and matrix completion methods. Finally, we highlight newly developed methods at the intersection of ML and econometrics that typically perform better than either off-the-shelf ML or more traditional econometric methods when applied to particular classes of problems, including causal inference for average treatment effects, optimal policy estimation, and estimation of the counterfactual effect of price changes in consumer choice models.

Keyword(s): causal inference, econometrics, JEL C30, machine learning

Article metrics loading...

/content/journals/10.1146/annurev-economics-080217-053433

2019-08-02

2024-05-11

Full text loading...

/deliver/fulltext/economics/11/1/annurev-economics-080217-053433.html?itemId=/content/journals/10.1146/annurev-economics-080217-053433&mimeType=html&fmt=ahah

Literature Cited

Abadie A, Cattaneo MD 2018. Econometric methods for program evaluation. Annu. Rev. Econ. 10:465–503
[Google Scholar]
Abadie A, Diamond A, Hainmueller J 2010. Synthetic control methods for comparative case studies: estimating the effect of California's tobacco control program. J. Am. Stat. Assoc. 105:493–505
[Google Scholar]
Abadie A, Diamond A, Hainmueller J 2015. Comparative politics and the synthetic control method. Am. J. Political Sci. 59:495–510
[Google Scholar]
Abadie A, Imbens GW 2011. Bias-corrected matching estimators for average treatment effects. J. Bus. Econ. Stat. 29:1–11
[Google Scholar]
Alpaydin E 2009. Introduction to Machine Learning Cambridge, MA: MIT Press
Angrist JD, Pischke JS 2008. Mostly Harmless Econometrics: An Empiricist's Companion Princeton, NJ: Princeton Univ. Press
Arjovsky M, Bottou L 2017. Towards principled methods for training generative adversarial networks. arXiv:1701.04862 [stat.ML]
[Google Scholar]
Arora S, Li Y, Liang Y, Ma T 2016. RAND-WALK: a latent variable model approach to word embeddings. Trans. Assoc. Comput. Linguist. 4:385–99
[Google Scholar]
Athey S 2017. Beyond prediction: using big data for policy problems. Science 355:483–85
[Google Scholar]
Athey S 2019. The impact of machine learning on economics. The Economics of Artificial Intelligence: An Agenda AK Agrawal, J Gans, A Goldfarb Chicago: Univ. Chicago Press In press
[Google Scholar]
Athey S, Bayati M, Doudchenko N, Imbens G, Khosravi K 2017a. Matrix completion methods for causal panel data models. arXiv:1710.10251 [math.ST]
[Google Scholar]
Athey S, Bayati M, Imbens G, Zhaonan Q 2019. Ensemble methods for causal effects in panel data settings NBER Work. Pap. 25675
Athey S, Blei D, Donnelly R, Ruiz F 2017b. Counterfactual inference for consumer choice across many product categories. AEA Pap. Proc. 108:64–67
[Google Scholar]
Athey S, Imbens G 2016. Recursive partitioning for heterogeneous causal effects. PNAS 113:7353–60
[Google Scholar]
Athey S, Imbens G, Wager S 2016a. Efficient inference of average treatment effects in high dimensions via approximate residual balancing. arXiv:1604.07125 [math.ST]
[Google Scholar]
Athey S, Imbens GW 2017a. The econometrics of randomized experiments. Handbook of Economic Field Experiments 1 E Duflo, A Banerjee73–140 Amsterdam: Elsevier
[Google Scholar]
Athey S, Imbens GW 2017b. The state of applied econometrics: causality and policy evaluation. J. Econ. Perspect. 31:3–32
[Google Scholar]
Athey S, Mobius MM, Pál J 2017c. The impact of aggregators on internet news consumption Unpublished manuscript Grad. School Bus., Stanford Univ. Stanford, CA:
Athey S, Tibshirani J, Wager S 2016b. Generalized random forests. arXiv:1610.01271 [stat.ME]
[Google Scholar]
Athey S, Wager S 2017. Efficient policy learning. arXiv:1702.02896 [math.ST]
[Google Scholar]
Bai J 2003. Inferential theory for factor models of large dimensions. Econometrica 71:135–71
[Google Scholar]
Bai J, Ng S 2002. Determining the number of factors in approximate factor models. Econometrica 70:191–221
[Google Scholar]
Bai J, Ng S 2017. Principal components and regularized estimation of factor models. arXiv:1708.08137 [stat.ME]
[Google Scholar]
Bamler R, Mandt S 2017. Dynamic word embeddings via skip-gram filtering. Proceedings of the 34th International Conference on Machine Learning380–89 La Jolla, CA: Int. Mach. Learn. Soc.
[Google Scholar]
Barkan O 2016. Bayesian neural word embedding. arXiv:1603.06571 [math.ST]
[Google Scholar]
Bastani H, Bayati M 2015. Online decision-making with high-dimensional covariates Work. Pap. Univ. Penn./Stanford Grad. School Bus. Philadelphia/Stanford, CA:
Bell RM, Koren Y 2007. Lessons from the Netflix prize challenge. ACM SIGKDD Explor. Newsl. 9:75–79
[Google Scholar]
Belloni A, Chernozhukov V, Hansen C 2014. High-dimensional methods and inference on structural and treatment effects. J. Econ. Perspect. 28:29–50
[Google Scholar]
Bengio Y, Ducharme R, Vincent P, Janvin C 2003. A neural probabilistic language model. J. Mach. Learn. Res. 3:1137–55
[Google Scholar]
Bengio Y, Schwenk H, Senécal JS, Morin F, Gauvain JL 2006. Neural probabilistic language models. Innovations in Machine Learning: Theory and Applications DE Holmes137–86 Berlin: Springer
[Google Scholar]
Bennett J, Lanning S 2007. The Netflix prize. Proceedings of KDD Cup and Workshop 2007 New York: ACM
[Google Scholar]
Bertsimas D, King A, Mazumder R 2016. Best subset selection via a modern optimization lens. Ann. Stat. 44:813–52
[Google Scholar]
Bickel P, Klaassen C, Ritov Y, Wellner J 1998. Efficient and Adaptive Estimation for Semiparametric Models Berlin: Springer
Bierens HJ 1987. Kernel estimators of regression functions. Advances in Econometrics: Fifth World Congress 1 TF Bewley99–144 Cambridge, UK: Cambridge Univ. Press
[Google Scholar]
Blei DM, Lafferty JD 2009. Topic models. Text Mining: Classification, Clustering, and Applications A Srivastava, M Sahami101–24 Boca Raton, FL: CRC Press
[Google Scholar]
Bottou L 1998. Online learning and stochastic approximations. On-Line Learning in Neural Networks D Saad9–42 New York: ACM
[Google Scholar]
Bottou L 2012. Stochastic gradient descent tricks. Neural Networks: Tricks of the Trade G Montavon, G Orr, K-R Müller421–36 Berlin: Springer
[Google Scholar]
Breiman L 1993. Better subset selection using the non-negative garotte Tech. Rep. Univ. Calif. Berkeley:
Breiman L 1996. Bagging predictors. Mach. Learn. 24:123–40
[Google Scholar]
Breiman L 2001a. Random forests. Mach. Learn. 45:5–32
[Google Scholar]
Breiman L 2001b. Statistical modeling: the two cultures (with comments and a rejoinder by the author). Stat. Sci. 16:199–231
[Google Scholar]
Breiman L, Friedman J, Stone CJ, Olshen RA 1984. Classification and Regression Trees Boca Raton, FL: CRC Press
Burkov A 2019. The Hundred-Page Machine Learning Book Quebec City, Can.: Andriy Burkov
Candès E, Tao T 2007. The Dantzig selector: statistical estimation when is much larger than . Ann. Stat. 35:2313–51
[Google Scholar]
Candès EJ, Recht B 2009. Exact matrix completion via convex optimization. Found. Comput. Math. 9:717
[Google Scholar]
Chamberlain G 2000. Econometrics and decision theory. J. Econom. 95:255–83
[Google Scholar]
Chen X 2007. Large sample sieve estimation of semi-nonparametric models. Handbook of Econometrics 6B JJ Heckman, EE Learner5549–632 Amsterdam: Elsevier
[Google Scholar]
Chernozhukov V, Chetverikov D, Demirer M, Duflo E, Hansen C et al. 2016a. Double machine learning for treatment and causal parameters Tech. Rep., Cent. Microdata Methods Pract., Inst. Fiscal Stud., London
Chernozhukov V, Chetverikov D, Demirer M, Duflo E, Hansen C et al. 2018a. Double/debiased machine learning for treatment and structural parameters. Econom. J. 21:C1–68
[Google Scholar]
Chernozhukov V, Chetverikov D, Demirer M, Duflo E, Hansen C, Newey W 2017. Double/debiased/Neyman machine learning of treatment effects. Am. Econ. Rev. 107:261–65
[Google Scholar]
Chernozhukov V, Demirer M, Duflo E, Fernandez-Val I 2018b. Generic machine learning inference on heterogenous treatment effects in randomized experiments NBER Work. Pap. 24678
Chernozhukov V, Escanciano JC, Ichimura H, Newey WK 2016b. Locally robust semiparametric estimation. arXiv:1608.00033 [math.ST]
[Google Scholar]
Chernozhukov V, Newey W, Robins J 2018c. Double/de-biased machine learning using regularized Riesz representers. arXiv:1802.08667 [stat.ML]
[Google Scholar]
Chipman HA, George EI, McCulloch RE 2010. Bart: Bayesian additive regression trees. Ann. Appl. Stat. 4:266–98
[Google Scholar]
Cortes C, Vapnik V 1995. Support-vector networks. Mach. Learn. 20:273–97
[Google Scholar]
Dietterich TG 2000. Ensemble methods in machine learning. Multiple Classifier Systems: First International Workshop, Cagliari, Italy, June 21–231–15 Berlin: Springer
[Google Scholar]
Dimakopoulou M, Athey S, Imbens G 2017. Estimation considerations in contextual bandits. arXiv:1711.07077 [stat.ML]
[Google Scholar]
Dimakopoulou M, Zhou Z, Athey S, Imbens G 2018. Balanced linear contextual bandits. arXiv:1812.06227 [cs.LG]
[Google Scholar]
Doudchenko N, Imbens GW 2016. Balancing, regression, difference-in-differences and synthetic control methods: a synthesis NBER Work. Pap. 22791
Dudik M, Erhan D, Langford J, Li L 2014. Doubly robust policy evaluation and optimization. Stat. Sci. 29:485–511
[Google Scholar]
Dudik M, Langford J, Li L 2011. Doubly robust policy evaluation and learning. Proceedings of the 28th International Conference on Machine Learning1097–104 La Jolla, CA: Int. Mach. Learn. Soc.
[Google Scholar]
Efron B, Hastie T 2016. Computer Age Statistical Inference 5 Cambridge, UK: Cambridge Univ. Press
Efron B, Hastie T, Johnstone I, Tibshirani R 2004. Least angle regression. Ann. Stat. 32:407–99
[Google Scholar]
Farrell MH, Liang T, Misra S 2018. Deep neural networks for estimation and inference: application to causal effects and other semiparametric estimands. arXiv:1809.09953 [econ.EM]
[Google Scholar]
Firth JR 1957. A synopsis of linguistic theory 1930–1955. Studies in Linguistic Analysis (Special Volume of the Philological Society) JR Firth1–32 Oxford, UK: Blackwell
[Google Scholar]
Friedberg R, Tibshirani J, Athey S, Wager S 2018. Local linear forests. arXiv:1807.11408 [stat.ML]
[Google Scholar]
Friedman JH 2002. Stochastic gradient boosting. Comput. Stat. Data Anal. 38:367–78
[Google Scholar]
Gentzkow M, Kelly BT, Taddy M 2017. Text as data NBER Work. Pap. 23276
Goodfellow I, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D et al. 2014. Generative adversarial nets. Advances in Neural Information Processing Systems 27 Z Ghahramani, M Welling, C Cortes, ND Lawrence, KQ Weinberger2672–80 San Diego, CA: Neural Inf. Process. Syst. Found.
[Google Scholar]
Gopalan P, Hofman J, Blei DM 2015. Scalable recommendation with hierarchical Poisson factorization. Proceedings of the 31st Conference on Uncertainty in Artificial Intelligence, Amsterdam, Neth., July 12–16 Amsterdam: Assoc. Uncertain. Artif. Intell
[Google Scholar]
Green DP, Kern HL 2012. Modeling heterogeneous treatment effects in survey experiments with Bayesian additive regression trees. Public Opin. Q. 76:491–511
[Google Scholar]
Greene WH 2000. Econometric Analysis Upper Saddle River, NJ: Prentice Hall. 4th ed
Harris ZS 1954. Distributional structure. Word 10:146–62
[Google Scholar]
Hartford J, Lewis G, Taddy M 2016. Counterfactual prediction with deep instrumental variables networks. arXiv:1612.09596 [stat.AP]
[Google Scholar]
Hartigan JA, Wong MA 1979. Algorithm as 136: a k-means clustering algorithm. J. R. Stat. Soc. Ser. C 28:100–8
[Google Scholar]
Hastie T, Tibshirani R, Friedman J 2009. The Elements of Statistical Learning Berlin: Springer
Hastie T, Tibshirani R, Tibshirani RJ 2017. Extended comparisons of best subset selection, forward stepwise selection, and the lasso. arXiv:1707.08692 [stat.ME]
[Google Scholar]
Hastie T, Tibshirani R, Wainwright M 2015. Statistical Learning with Sparsity: The Lasso and Generalizations New York: CRC Press
Hill JL 2011. Bayesian nonparametric modeling for causal inference. J. Comput. Graph. Stat. 20:217–40
[Google Scholar]
Hirano K, Porter JR 2009. Asymptotics for statistical treatment rules. Econometrica 77:1683–701
[Google Scholar]
Hoerl AE, Kennard RW 1970. Ridge regression: biased estimation for nonorthogonal problems. Technometrics 12:55–67
[Google Scholar]
Holland PW 1986. Statistics and causal inference. J. Am. Stat. Assoc. 81:945–60
[Google Scholar]
Hornik K, Stinchcombe M, White H 1989. Multilayer feedforward networks are universal approximators. Neural Netw. 2:359–66
[Google Scholar]
Imai K, Ratkovic M 2013. Estimating treatment effect heterogeneity in randomized program evaluation. Ann. Appl. Stat. 7:443–70
[Google Scholar]
Imbens G, Wooldridge J 2009. Recent developments in the econometrics of program evaluation. J. Econ. Lit. 47:5–86
[Google Scholar]
Imbens GW, Lemieux T 2008. Regression discontinuity designs: a guide to practice. J. Econom. 142:615–35
[Google Scholar]
Imbens GW, Rubin DB 2015. Causal Inference in Statistics, Social, and Biomedical Sciences Cambridge, UK: Cambridge Univ. Press
Jacobs B, Donkers B, Fok D 2014. Product Recommendations Based on Latent Purchase Motivations Rotterdam, Neth.: ERIM
Jiang N, Li L 2016. Doubly robust off-policy value evaluation for reinforcement learning. Proceedings of the 33rd International Conference on Machine Learning652–61 La Jolla, CA: Int. Mach. Learn. Soc.
[Google Scholar]
Kallus N 2017. Balanced policy evaluation and learning. arXiv:1705.07384 [stat.ML]
[Google Scholar]
Keane MP 2013. Panel data discrete choice models of consumer demand. The Oxford Handbook of Panel Data BH Baltagi54–102 Oxford, UK: Oxford Univ. Press
[Google Scholar]
Kitagawa T, Tetenov A 2015. Who should be treated? Empirical welfare maximization methods for treatment choice Tech. Rep., Cent. Microdata Methods Pract., Inst. Fiscal Stud., London
Knox SW 2018. Machine Learning: A Concise Introduction Hoboken, NJ: Wiley
Krizhevsky A, Sutskever I, Hinton GE 2012. Imagenet classification with deep convolutional neural networks. Advances in Neural Information Processing Systems 25 Z Ghahramani, M Welling, C Cortes, ND Lawrence, KQ Weinberger1097–105 San Diego, CA: Neural Inf. Process. Syst. Found.
[Google Scholar]
Künzel S, Sekhon J, Bickel P, Yu B 2017. Meta-learners for estimating heterogeneous treatment effects using machine learning. arXiv:1706.03461 [math.ST]
[Google Scholar]
Lai TL, Robbins H 1985. Asymptotically efficient adaptive allocation rules. Adv. Appl. Math. 6:4–22
[Google Scholar]
LeCun Y, Bengio Y, Hinton G 2015. Deep learning. Nature 521:436–44
[Google Scholar]
Levy O, Goldberg Y 2014. Neural word embedding as implicit matrix factorization. Advances in Neural Information Processing Systems 27 Z Ghahramani, M Welling, C Cortes, ND Lawrence, KQ Weinberger2177–85 San Diego, CA: Neural Inf. Process. Syst. Found.
[Google Scholar]
Li L, Chen S, Kleban J, Gupta A 2014. Counterfactual estimation and optimization of click metrics for search engines: a case study. Proceedings of the 24th International Conference on the World Wide Web929–34 New York: ACM
[Google Scholar]
Li L, Chu W, Langford J, Moon T, Wang X 2012. An unbiased offline evaluation of contextual bandit algorithms with generalized linear models. Proceedings of 4th ACM International Conference on Web Search and Data Mining297–306 New York: ACM
[Google Scholar]
Matzkin RL 1994. Restrictions of economic theory in nonparametric methods. Handbook of Econometrics 4 R Engle, D McFadden2523–58 Amsterdam: Elsevier
[Google Scholar]
Matzkin RL 2007. Nonparametric identification. Handbook of Econometrics 6B J Heckman, E Learner5307–68 Amsterdam: Elsevier
[Google Scholar]
Mazumder R, Hastie T, Tibshirani R 2010. Spectral regularization algorithms for learning large incomplete matrices. J. Mach. Learn. Res. 11:2287–322
[Google Scholar]
Meinshausen N 2007. Relaxed lasso. Comput. Stat. Data Anal. 52:374–93
[Google Scholar]
Mikolov T, Chen K, Corrado GS, Dean J 2013a. Efficient estimation of word representations in vector space. arXiv:1301.3781 [cs.CL]
[Google Scholar]
Mikolov T, Sutskever I, Chen K, Corrado GS, Dean J 2013b. Distributed representations of words and phrases and their compositionality. Advances in Neural Information Processing Systems 26 Z Ghahramani, M Welling, C Cortes, ND Lawrence, KQ Weinberger3111–19 San Diego, CA: Neural Inf. Process. Syst. Found.
[Google Scholar]
Mikolov T, Yih W, Zweig G 2013c. Linguistic regularities in continuous space word representations. Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies746–51 New York: Assoc. Comput. Linguist.
[Google Scholar]
Miller A 2002. Subset Selection in Regression New York: CRC Press
Mnih A, Hinton GE 2007. Three new graphical models for statistical language modelling. International Conference on Machine Learning641–48 La Jolla, CA: Int. Mach. Learn. Soc.
[Google Scholar]
Mnih A, Kavukcuoglu K 2013. Learning word embeddings efficiently with noise-contrastive estimation. Advances in Neural Information Processing Systems 26 Z Ghahramani, M Welling, C Cortes, ND Lawrence, KQ Weinberger2265–73 San Diego, CA: Neural Inf. Process. Syst. Found.
[Google Scholar]
Mnih A, Teh YW 2012. A fast and simple algorithm for training neural probabilistic language models. Proceedings of the 29th International Conference on Machine Learning419–26 La Jolla, CA: Int. Mach. Learn. Soc.
[Google Scholar]
Morris CN 1983. Parametric empirical Bayes inference: theory and applications. J. Am. Stat. Assoc. 78:47–55
[Google Scholar]
Mullainathan S, Spiess J 2017. Machine learning: an applied econometric approach. J. Econ. Perspect. 31:87–106
[Google Scholar]
Nie X, Wager S 2019. Quasi-oracle estimation of heterogeneous treatment effects. arXiv:1712.04912 [stat.ML]
[Google Scholar]
Pennington J, Socher R, Manning CD 2014. GloVe: global vectors for word representation. Proceedings of the 2014 Conference on Empirical Methods on Natural Language Processing1532–43 New York: Assoc. Comput. Linguist.
[Google Scholar]
Robins J, Rotnitzky A 1995. Semiparametric efficiency in multivariate regression models with missing data. J. Am. Stat. Assoc. 90:122–29
[Google Scholar]
Rosenbaum PR, Rubin DB 1983. The central role of the propensity score in observational studies for causal effects. Biometrika 70:41–55
[Google Scholar]
Ruiz FJ, Athey S, Blei DM 2017. SHOPPER: a probabilistic model of consumer choice with substitutes and complements. arXiv:1711.03560 [stat.ML]
[Google Scholar]
Rumelhart DE, Hinton GE, Williams RJ 1986. Learning representations by back-propagating errors. Nature 323:533–36
[Google Scholar]
Schapire RE, Freund Y 2012. Boosting: Foundations and Algorithms Cambridge, MA: MIT Press
Scholkopf B, Smola AJ 2001. Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond Cambridge, MA: MIT Press
Scott SL 2010. A modern Bayesian look at the multi-armed bandit. Appl. Stoch. Models Bus. Ind. 26:639–58
[Google Scholar]
Semenova V, Goldman M, Chernozhukov V, Taddy M 2018. Orthogonal ML for demand estimation: high dimensional causal inference in dynamic panels. arXiv:1712.09988 [stat.ML]
[Google Scholar]
Strehl A, Langford J, Li L, Kakade S 2010. Learning from logged implicit exploration data. Advances in Neural Information Processing Systems 23 Z Ghahramani, M Welling, C Cortes, ND Lawrence, KQ Weinberger2217–25 San Diego, CA: Neural Inf. Process. Syst. Found.
[Google Scholar]
Sutton RS, Barto AG 1998. Reinforcement Learning: An Introduction Cambridge, MA: MIT Press
Swaminathan A, Joachims T 2015. Batch learning from logged bandit feedback through counterfactual risk minimization. J. Mach. Learn. Res. 16:1731–55
[Google Scholar]
Thomas P, Brunskill E 2016. Data-efficient off-policy policy evaluation for reinforcement learning. Proceedings of the International Conference on Machine Learning2139–48 La Jolla, CA: Int. Mach. Learn. Soc.
[Google Scholar]
Thompson WR 1933. On the likelihood that one unknown probability exceeds another in view of the evidence of two samples. Biometrika 25:285–94
[Google Scholar]
Tibshirani R 1996. Regression shrinkage and selection via the lasso. J. R. Stat. Soc. B 58:267–88
[Google Scholar]
Tibshirani R, Hastie T 1987. Local likelihood estimation. J. Am. Stat. Assoc. 82:559–67
[Google Scholar]
van der Laan MJ, Rubin D 2006. Targeted maximum likelihood learning. Int. J. Biostat. 2:134–56
[Google Scholar]
Van der Vaart AW 2000. Asymptotic Statistics Cambridge, UK: Cambridge Univ. Press
Vapnik V 2013. The Nature of Statistical Learning Theory Berlin: Springer
Varian HR 2014. Big data: new tricks for econometrics. J. Econ. Perspect. 28:3–28
[Google Scholar]
Vilnis L, McCallum A 2015. Word representations via Gaussian embedding. arXiv:1412.6623 [cs.CL]
[Google Scholar]
Wager S, Athey S 2017. Estimation and inference of heterogeneous treatment effects using random forests. J. Am. Stat. Assoc. 113:1228–42
[Google Scholar]
Wan M, Wang D, Goldman M, Taddy M, Rao J et al. 2017. Modeling consumer preferences and price sensitivities from large-scale grocery shopping transaction logs. Proceedings of the 26th International Conference on the World Wide Web1103–12 New York: ACM
[Google Scholar]
White H 1992. Artificial Neural Networks: Approximation and Learning Theory Oxford, UK: Blackwell
Wooldridge JM 2010. Econometric Analysis of Cross Section and Panel Data Cambridge, MA: MIT Press
Wu X, Kumar V, Quinlan JR, Ghosh J, Yang Q et al. 2008. Top 10 algorithms in data mining. Knowl. Inform. Syst. 14:1–37
[Google Scholar]
Zeileis A, Hothorn T, Hornik K 2008. Model-based recursive partitioning. J. Comput. Graph. Stat. 17:492–514
[Google Scholar]
Zhou Z, Athey S, Wager S 2018. Offline multi-action policy learning: generalization and optimization. arXiv:1810.04778 [stat.ML]
[Google Scholar]
Zou H, Hastie T 2005. Regularization and variable selection via the elastic net. J. R. Stat. Soc. B 67:301–20
[Google Scholar]
Zubizarreta JR 2015. Stable weights that balance covariates for estimation with incomplete outcome data. J. Am. Stat. Assoc. 110:910–22
[Google Scholar]

/content/journals/10.1146/annurev-economics-080217-053433

Machine Learning Methods That Economists Should Know About

Annual Review of Economics 11, 685 (2019); https://doi.org/10.1146/annurev-economics-080217-053433

/content/journals/10.1146/annurev-economics-080217-053433

Data & Media loading...

Article Type: Review Article

Most Cited Most Cited RSS feed

- Power Laws in Economics and Finance
  
  Xavier Gabaix
  
  Vol. 1 (2009), pp. 255–294
- The Gravity Model
  
  James E. Anderson
  
  Vol. 3 (2011), pp. 133–160
- Microeconomics of Technology Adoption
  
  Andrew D. Foster, and Mark R. Rosenzweig
  
  Vol. 2 (2010), pp. 395–424
- The China Shock: Learning from Labor-Market Adjustment to Large Changes in Trade
  
  David H. Autor, David Dorn, and Gordon H. Hanson
  
  Vol. 8 (2016), pp. 205–240
- Financial Literacy, Financial Education, and Economic Outcomes
  
  Justine S. Hastings, Brigitte C. Madrian, and William L. Skimmyhorn
  
  Vol. 5 (2013), pp. 347–373
- Gender and Competition
  
  Muriel Niederle, and Lise Vesterlund
  
  Vol. 3 (2011), pp. 601–630
- Corruption in Developing Countries
  
  Benjamin A. Olken, and Rohini Pande
  
  Vol. 4 (2012), pp. 479–509
- The Economics of Human Development and Social Mobility
  
  James J. Heckman, and Stefano Mosso
  
  Vol. 6 (2014), pp. 689–733
- The Roots of Gender Inequality in Developing Countries
  
  Seema Jayachandran
  
  Vol. 7 (2015), pp. 63–88
- The Consumption Response to Income Changes
  
  Tullio Jappelli, and Luigi Pistaferri
  
  Vol. 2 (2010), pp. 479–506
More Less

Annual Review of Economics

Volume 11, 2019

Review Article

Free

Machine Learning Methods That Economists Should Know About

Abstract

Most Read This Month

Most Cited Most Cited RSS feed

Power Laws in Economics and Finance

The Gravity Model

Microeconomics of Technology Adoption

The China Shock: Learning from Labor-Market Adjustment to Large Changes in Trade

Financial Literacy, Financial Education, and Economic Outcomes

Gender and Competition

Corruption in Developing Countries

The Economics of Human Development and Social Mobility

The Roots of Gender Inequality in Developing Countries

The Consumption Response to Income Changes