1932

Abstract

This article provides a selective overview of the recent developments in factor models and their applications in econometric learning. We focus on the perspective of the low-rank structure of factor models and particularly draw attention to estimating the model from the low-rank recovery point of view. Our survey mainly consists of three parts. The first part is a review of new factor estimations based on modern techniques for recovering low-rank structures of high-dimensional models. The second part discusses statistical inferences of several factor-augmented models and their applications in statistical learning models. The final part summarizes new developments dealing with unbalanced panels from the matrix completion perspective.

Loading

Article metrics loading...

/content/journals/10.1146/annurev-financial-091420-011735
2021-11-01
2025-02-09
Loading full text...

Full text loading...

/deliver/fulltext/financial/13/1/annurev-financial-091420-011735.html?itemId=/content/journals/10.1146/annurev-financial-091420-011735&mimeType=html&fmt=ahah

Literature Cited

  1. Abbe E 2017. Community detection and stochastic block models: recent developments. J. Mach. Learn. Res. 18:16446–531
    [Google Scholar]
  2. Abbe E, Fan J, Wang K, Zhong Y 2020. Entrywise eigenvector analysis of random matrices with low expected rank. Ann. Stat. 48:1452–74
    [Google Scholar]
  3. Agarwal A, Negahban S, Wainwright MJ. 2012. Noisy matrix decomposition via convex relaxation: optimal rates in high dimensions. Ann. Stat. 40:21171–97
    [Google Scholar]
  4. Ahn S, Horenstein A. 2013. Eigenvalue ratio test for the number of factors. Econometrica 81:1203–27
    [Google Scholar]
  5. Aït-Sahalia Y, Xiu D. 2017. Using principal component analysis to estimate a high dimensional factor model with high-frequency data. J. Econom. 201:2384–99
    [Google Scholar]
  6. Antoniadis A, Fan J. 2001. Regularized wavelet approximations. J. Am. Stat. Assoc. 96:939–67
    [Google Scholar]
  7. Athey S, Bayati M, Doudchenko N, Imbens G, Khosravi K. 2018. Matrix completion methods for causal panel data models NBER Work. Pap 25132
    [Google Scholar]
  8. Bai J. 2003. Inferential theory for factor models of large dimensions. Econometrica 71:135–71
    [Google Scholar]
  9. Bai J, Li K. 2012. Statistical analysis of factor models of high dimension. Ann. Stat. 40:1436–65
    [Google Scholar]
  10. Bai J, Li K. 2016. Maximum likelihood estimation and inference for approximate factor models of high dimension. Rev. Econ. Stat. 98:2298–309
    [Google Scholar]
  11. Bai J, Liao Y. 2016. Efficient estimation of approximate factor models via penalized maximum likelihood. J. Econom. 191:11–18
    [Google Scholar]
  12. Bai J, Ng S. 2002. Determining the number of factors in approximate factor models. Econometrica 70:1191–221
    [Google Scholar]
  13. Bai J, Ng S. 2006. Confidence intervals for diffusion index forecasts and inference for factor-augmented regressions. Econometrica 74:41133–50
    [Google Scholar]
  14. Bai J, Ng S. 2019a. Rank regularized estimation of approximate factor models. J. Econom. 212:178–96
    [Google Scholar]
  15. Bai J, Ng S. 2019b. Matrix completion, counterfactuals, and factor analysis of missing data. arXiv:1910.06677 [econ.EM]
  16. Bai J, Wang P. 2016. Econometric analysis of large factor models. Annu. Rev. Econ. 8:53–80
    [Google Scholar]
  17. Baltagi BH, Kao C, Wang F. 2017. Identification and estimation of a large factor model with structural instability. J. Econom. 197:187–100
    [Google Scholar]
  18. Barigozzi M, Cho H. 2018. Consistent estimation of high-dimensional factor models when the factor number is over-estimated. arXiv:1811.00306 [stat.ME]
  19. Barigozzi M, Cho H, Fryzlewicz P. 2018. Simultaneous multiple change-point and factor analysis for high-dimensional time series. J. Econom. 206:1187–225
    [Google Scholar]
  20. Barigozzi M, Luciani M. 2019. Quasi maximum likelihood estimation and inference of large approximate dynamic factor models via the EM algorithm. arXiv:1910.03821 [math.ST]
  21. Barras L, Scaillet O, Wermers R. 2010. False discoveries in mutual fund performance: measuring luck in estimated alphas. J. Finance 65:1179–216
    [Google Scholar]
  22. Belloni A, Chernozhukov V, Hansen C. 2014. Inference on treatment effects after selection among high-dimensional controls. Rev. Econ. Stud. 81:2608–50
    [Google Scholar]
  23. Benjamini Y, Hochberg Y. 1995. Controlling the false discovery rate: a practical and powerful approach to multiple testing. J. R. Stat. Soc. B 57:1289–300
    [Google Scholar]
  24. Brillinger DR. 1964. A frequency approach to the techniques of principal components, factor analysis and canonical variates in the case of stationary time series. Invited paper, Royal Statistical Society Conference, Cardiff Wales, UK: Sept. 29–Oct. 1. https://www.stat.berkeley.edu/∼brill/Papers/rss1964.pdf
    [Google Scholar]
  25. Cai T, Cai TT, Zhang A. 2016. Structured matrix completion with applications to genomic data integration. J. Am. Stat. Assoc. 111:514621–33
    [Google Scholar]
  26. Cai T, Liu W. 2011. Adaptive thresholding for sparse covariance matrix estimation. J. Am. Stat. Assoc. 106:494672–84
    [Google Scholar]
  27. Candès EJ, Li X, Ma Y, Wright J. 2011. Robust principal component analysis?. J. Assoc. Comput. Mach. 58:31–37
    [Google Scholar]
  28. Catoni O. 2012. Challenging the empirical mean and empirical variance: a deviation study. Ann. l'IHP Probab. Stat. 48:1148–85
    [Google Scholar]
  29. Chan KS. 1993. Consistency and limiting distribution of the least squares estimator of a threshold autoregressive model. Ann. Stat. 21:1520–33
    [Google Scholar]
  30. Chen D, Mykland PA, Zhang L. 2020. The five trolls under the bridge: principal component analysis with asynchronous and noisy high frequency data. J. Am. Stat. Assoc. 115:5321960–77
    [Google Scholar]
  31. Chen EY, Tsay RS, Chen R 2020. Constrained factor models for high-dimensional matrix-variate time series. J. Am. Stat. Assoc. 115:530775–93
    [Google Scholar]
  32. Chen Y, Chi Y, Fan J, Ma C, Yan Y. 2020a. Noisy matrix completion: understanding statistical guarantees for convex relaxation via nonconvex optimization. SIAM J. Optim. 30:43098–121
    [Google Scholar]
  33. Chen Y, Fan J, Ma C, Yan Y 2019. Inference and uncertainty quantification for noisy matrix completion. PNAS 116:4622931–37
    [Google Scholar]
  34. Chen Y, Fan J, Ma C, Yan Y. 2020b. Bridging convex and nonconvex optimization in robust PCA: noise, outliers, and missing data. arXiv:2001.05484 [stat.ML]
  35. Cheng X, Liao Z, Schorfheide F. 2016. Shrinkage estimation of high-dimensional factor models with structural instabilities. Rev. Econ. Stud. 83:41511–43
    [Google Scholar]
  36. Chernozhukov V, Hansen CB, Liao Y, Zhu Y. 2019. Inference for heterogeneous effects using low-rank estimations. Work. Pap. CWP31/19, Cent. Microdata Methods Pract. London:
    [Google Scholar]
  37. Chudik A, Pesaran MH, Tosetti E. 2011. Weak and strong cross-section dependence and estimation of large panels. Econom. J. 14:1C45–90
    [Google Scholar]
  38. Connor G, Linton O. 2007. Semiparametric estimation of a characteristic-based factor model of stock returns. J. Empir. Finance 14:694–717
    [Google Scholar]
  39. Connor G, Matthias H, Linton O 2012. Efficient semiparametric estimation of the Fama-French model and extensions. Econometrica 80:713–54
    [Google Scholar]
  40. Doz C, Giannone D, Reichlin L. 2011. A two-step estimator for large approximate dynamic factor models based on Kalman filtering. J. Econom. 164:1188–205
    [Google Scholar]
  41. Doz C, Giannone D, Reichlin L. 2012. A quasi-maximum likelihood approach for large, approximate dynamic factor models. Rev. Econ. Stat. 94:1014–24
    [Google Scholar]
  42. Fama EF, French KR. 2015. A five-factor asset pricing model. J. Financ. Econ. 116:11–22
    [Google Scholar]
  43. Fan J, Han X, Gu W. 2012. Estimating false discovery proportion under arbitrary covariance dependence. J. Am. Stat. Assoc. 107:4991019–35
    [Google Scholar]
  44. Fan J, Ke Y, Liao Y. 2021. Augmented factor models with applications to validating market risk factors and forecasting bond risk premia. J. Econom. 222:269–94
    [Google Scholar]
  45. Fan J, Ke Y, Sun Q, Zhou WX. 2019a. FarmTest: factor-adjusted robust multiple testing with approximate false discovery control. J. Am. Stat. Assoc. 114:1880–93
    [Google Scholar]
  46. Fan J, Ke Y, Wang K. 2020. Factor-adjusted regularized model selection. J. Econom. 216:47171–85
    [Google Scholar]
  47. Fan J, Kim D. 2019. Structured volatility matrix estimation for non-synchronized high-frequency financial data. J. Econom. 209:161–78
    [Google Scholar]
  48. Fan J, Li Q, Wang Y 2017. Estimation of high dimensional mean regression in the absence of symmetry and light tail assumptions. J. R. Stat. Soc. B 79:1247–65
    [Google Scholar]
  49. Fan J, Li R, Zhang CH, Zou H. 2020. Statistical Foundations of Data Science. Boca Raton, FL: CRC Press
    [Google Scholar]
  50. Fan J, Liao Y. 2020. Learning latent factors from diversified projections and its applications to over-estimated and weak factors. SSRN Work. Pap. 3446097
    [Google Scholar]
  51. Fan J, Liao Y, Mincheva M. 2013. Large covariance estimation by thresholding principal orthogonal complements (with discussion). J. R. Stat. Soc. B 75:603–80
    [Google Scholar]
  52. Fan J, Liao Y, Wang W. 2016. Projected principal component analysis in factor models. Ann. Stat. 44:1219–54
    [Google Scholar]
  53. Fan J, Liao Y, Yao J. 2015. Power enhancement in high-dimensional cross-sectional tests. Econometrica 83:1497–541
    [Google Scholar]
  54. Fan J, Lv J. 2008. Sure independence screening for ultrahigh dimensional feature space. J. R. Stat. Soc. B 70:5849–911
    [Google Scholar]
  55. Fan J, Wang D, Wang K, Zhu Z. 2019b. Distributed estimation of principal eigenspaces. Ann. Stat. 47:63009–31
    [Google Scholar]
  56. Fan J, Wang W, Zhong Y. 2018. An eigenvector perturbation bound and its application to robust covariance estimation. J. Mach. Learn. Res. 18:2071–42
    [Google Scholar]
  57. Fan J, Wang W, Zhong Y. 2019. Robust covariance estimation for approximate factor models. J. Econom. 208:15–22
    [Google Scholar]
  58. Fan J, Wang W, Zhu Z. 2021. A shrinkage principle for heavy-tailed data: high-dimensional robust low-rank matrix recovery. Ann. Stat. 49:3123966
    [Google Scholar]
  59. Fan J, Xue L, Yao J. 2017. Sufficient forecasting using factor models. J. Econom. 201:2292–306
    [Google Scholar]
  60. Fan J, Zhong Y. 2018. Optimal subspace estimation using overidentifying vectors via generalized method of moments. arXiv:1805.02826 [stat.ME]
  61. Forni M, Hallin M, Lippi M, Reichlin L. 2000. The generalized dynamic factor model: identification and estimation. Rev. Econ. Stat. 82:540–54
    [Google Scholar]
  62. Forni M, Hallin M, Lippi M, Reichlin L. 2005. The generalized dynamic factor model: one-sided estimation and forecasting. J. Am. Stat. Assoc. 100:471830–40
    [Google Scholar]
  63. Gagliardini P, Ossola E, Scaillet O. 2016. Time-varying risk premium in large cross-sectional equity data sets. Econometrica 84:3985–1046
    [Google Scholar]
  64. Gagliardini P, Ossola E, Scaillet O. 2019. Estimation of large dimensional conditional factor models in finance Res. Pap. 19–46 Swiss Finance Inst., Geneva
    [Google Scholar]
  65. Giannone D, Reichlin L, Small D. 2008. Nowcasting: the real-time informational content of macroeconomic data. J. Monet. Econ. 55:4665–76
    [Google Scholar]
  66. Giglio S, Liao Y, Xiu D. 2021. Thousands of alpha tests. Rev. Financ. Stud. 34:73456–96
    [Google Scholar]
  67. Goncalves S, Perron B. 2020. Bootstrapping factor models with cross sectional dependence. J. Econom. 218:476–95
    [Google Scholar]
  68. Hansen BE. 2000. Sample splitting and threshold estimation. Econometrica 68:3575–603
    [Google Scholar]
  69. Hansen C, Liao Y. 2018. The factor-lasso and k-step bootstrap approach for inference in high-dimensional economic applications. Econom. Theory 35:465–509
    [Google Scholar]
  70. Harvey CR, Liu Y. 2018. False (and missed) discoveries in financial economics. Tech. Rep., Duke Univ. Durham, NC:
    [Google Scholar]
  71. Harvey CR, Liu Y, Zhu H. 2015. … and the cross-section of expected returns. Rev. Financ. Stud. 29:15–68
    [Google Scholar]
  72. Imbens GW, Rubin DB. 2015. Causal Inference in Statistics, Social, and Biomedical Sciences. New York: Cambridge Univ. Press
    [Google Scholar]
  73. Juodis A, Sarafidis V. 2020. A linear estimator for factor-augmented fixed-T panels with endogenous regressors. Tech. Rep., Dep. Econom. Bus. Stat., Monash Univ. Melbourne, Aust:.
    [Google Scholar]
  74. Karabiyik H, Urbain JP, Westerlund J. 2019. CCE estimation of factor-augmented regression models with more factors than observables. J. Appl. Econom. 34:2268–84
    [Google Scholar]
  75. Ke ZT, Fan J, Wu Y 2015. Homogeneity pursuit. J. Am. Stat. Assoc. 110:509175–94
    [Google Scholar]
  76. Klopp O, Lounici K, Tsybakov AB. 2017. Robust matrix completion. Probab. Theory Relat. Fields 169:1–2523–64
    [Google Scholar]
  77. Koltchinskii V, Lounici K, Tsybakov AB. 2011. Nuclear-norm penalization and optimal rates for noisy low-rank matrix completion. Ann. Stat. 39:52302–29
    [Google Scholar]
  78. Lam C, Yao Q. 2012. Factor modeling for high-dimensional time series: inference for the number of factors. Ann. Stat. 40:2694–726
    [Google Scholar]
  79. Lawley D, Maxwell A. 1971. Factor Analysis as a Statistical Method London: Butterworths. , 2nd ed..
    [Google Scholar]
  80. Lee S, Liao Y, Seo MH, Shin Y. 2021. Factor-driven two-regime regression. Ann. Stat 49:3165678
    [Google Scholar]
  81. Li H, Li Q, Shi Y. 2017. Determining the number of factors when the number of factors can increase with sample size. J. Econom. 197:176–86
    [Google Scholar]
  82. Li J, Todorov V, Tauchen G. 2019. Jump factor models in large cross-sections. Quant. Econ. 10:2419–56
    [Google Scholar]
  83. Li KC. 1991. Sliced inverse regression for dimension reduction. J. Am. Stat. Assoc. 86:414316–27
    [Google Scholar]
  84. Liao Y, Yang X 2018. Uniform inference for characteristic effects of large continuous-time linear models. SSRN Work. Pap. 3069985
    [Google Scholar]
  85. Ludvigson S, Ng S 2010. A factor analysis of bond risk premia. Handbook of Empirical Economics and Financeed. A Ulah, D Giles 313–72 Boca Raton, FL: CRC Press
    [Google Scholar]
  86. Ma S, Goldfarb D, Chen L 2011. Fixed point and Bregman iterative methods for matrix rank minimization. Math. Program. 128:1–2321–53
    [Google Scholar]
  87. Massacci D. 2017. Least squares estimation of large dimensional threshold factor models. J. Econom. 197:1101–29
    [Google Scholar]
  88. McCracken MW, Ng S. 2016. FRED-MD: a monthly database for macroeconomic research. J. Bus. Econ. Stat. 34:4574–89
    [Google Scholar]
  89. Moon HR, Weidner M. 2018. Nuclear norm regularized estimation of panel regression models. arXiv:1810.10987 [econ.EM]
  90. Negahban S, Wainwright MJ. 2011. Estimation of (near) low-rank matrices with noise and high-dimensional scaling. Ann. Stat. 39:21069–97
    [Google Scholar]
  91. Onatski A. 2010. Determining the number of factors from empirical distribution of eigenvalues. Rev. Econ. Stat. 92:41004–16
    [Google Scholar]
  92. Onatski A. 2012. Asymptotics of the principal components estimator of large factor models with weakly influential factors. J. Econom. 168:2244–58
    [Google Scholar]
  93. Pelger M. 2019. Large-dimensional factor modeling based on high-frequency observations. J. Econom. 208:123–42
    [Google Scholar]
  94. Romano JP, Shaikh AM, Wolf M. 2008. Control of the false discovery rate under dependence using the bootstrap and subsampling. TEST 17:3417
    [Google Scholar]
  95. Romano JP, Wolf M. 2007. Control of generalized error rates in multiple testing. Ann. Stat. 35:41378–408
    [Google Scholar]
  96. Schott JR. 1994. Determining the dimensionality in sliced inverse regression. J. Am. Stat. Assoc. 89:425141–48
    [Google Scholar]
  97. Seo MH, Linton O. 2007. A smoothed least squares estimator for threshold regression models. J. Econom. 141:2704–35
    [Google Scholar]
  98. Stock JH, Watson MW. 2002a. Forecasting using principal components from a large number of predictors. J. Am. Stat. Assoc. 97:1167–79
    [Google Scholar]
  99. Stock JH, Watson MW. 2002b. Macroeconomic forecasting using diffusion indexes. J. Bus. Econ. Stat. 20:2147–62
    [Google Scholar]
  100. Stock JH, Watson MW. 2016. Dynamic factor models, factor-augmented vector autoregressions, and structural vector autoregressions in macroeconomics. Handbook of Macroeconomics, Vol. 2A, eds. J Taylor, H Uhlig 415–525 Amsterdam: Elsevier
    [Google Scholar]
  101. Storey JD. 2002. A direct approach to false discovery rates. J. R. Stat. Soc. B 64:3479–98
    [Google Scholar]
  102. Su L, Miao K, Jin S 2019. On factor models with random missing: EM estimation, inference, and cross validation Work. Pap. 04-2019 Sch. Econ., Singapore Manag. Univ.
    [Google Scholar]
  103. Su L, Wang X. 2017. On time-varying factor models: estimation and testing. J. Econom. 198:184–101
    [Google Scholar]
  104. Wang D, Liu X, Chen R 2019. Factor models for matrix-valued high-dimensional time series. J. Econom. 208:1231–48
    [Google Scholar]
  105. Wang S, Yang H, Yao C. 2019. On the penalized maximum likelihood estimation of high-dimensional approximate factor model. Comput. Stat. 34:2819–46
    [Google Scholar]
  106. Westerlund J, Urbain JP. 2013. On the estimation and inference in factor-augmented panel regressions with correlated loadings. Econ. Lett. 119:3247–50
    [Google Scholar]
  107. Xia D, Yuan M. 2019. Statistical inferences of linear forms for noisy matrix completion. arXiv:1909.00116 [math.ST]
  108. Xiong R, Pelger M. 2019. Large dimensional latent factor modeling with missing observations and applications to causal inference. arXiv:1910.08273 [econ.EM]
  109. Zhu Z, Wang T, Samworth RJ. 2019. High-dimensional principal component analysis with heterogeneous missingness. arXiv:1906.12125 [stat.ME]
/content/journals/10.1146/annurev-financial-091420-011735
Loading
/content/journals/10.1146/annurev-financial-091420-011735
Loading

Data & Media loading...

Supplementary Data

  • Article Type: Review Article
This is a required field
Please enter a valid email address
Approval was a Success
Invalid data
An Error Occurred
Approval was partially successful, following selected items could not be processed due to error