Due to rapid technological advances, researchers are now able to collect and analyze ever larger data sets. Statistical inference for big data often requires solving thousands or even millions of parallel inference problems simultaneously. This poses significant challenges and calls for new principles, theories, and methodologies. This review provides a selective survey of some recently developed methods and results for large-scale statistical inference, including detection, estimation, and multiple testing. We begin with the global testing problem, where the goal is to detect the existence of sparse signals in a data set, and then move to the problem of estimating the proportion of nonnull effects. Finally, we focus on multiple testing with false discovery rate (FDR) control. The FDR provides a powerful and practical approach to large-scale multiple testing and has been successfully used in a wide range of applications. We discuss several effective data-driven procedures and also present efficient strategies to handle various grouping, hierarchical, and dependency structures in the data.


Article metrics loading...

Loading full text...

Full text loading...


Literature Cited

  1. Abramovich F, Benjamini Y, Donoho DL, Johnstone IM. 2006. Adapting to unknown sparsity by controlling the false discovery rate. Ann. Stat. 34:584–653 [Google Scholar]
  2. Andreou E, Ghysels E. 2006. Monitoring disruptions in financial markets. J. Econom. 135:177–124 [Google Scholar]
  3. Bailey N, Pesaran M, Smith LV. 2014. A multiple testing approach to the regularisation of large sample correlation matrices CESifo Work. Pap. 4834, CESifo Group, Munich [Google Scholar]
  4. Barber RF, Candès EJ. 2015. Controlling the false discovery rate via knockoffs. Ann. Stat. 43:2055–85 [Google Scholar]
  5. Barras L, Scaillet O, Wermers R. 2010. False discoveries in mutual fund performance: measuring luck in estimated alphas. J. Finance 65:179–216 [Google Scholar]
  6. Basu P, Cai TT, Das K, Sun W. 2015. Weighted false discovery rate control in large-scale multiple testing. arXiv: 1508.01605 [stat.ME]
  7. Belloni A, Chen D, Chernozhukov V, Hansen C. 2012. Sparse models and methods for optimal instruments with an application to eminent domain. Econometrica 80:62369–429 [Google Scholar]
  8. Belloni A, Chernozhukov V, Hansen C. 2014a. High-dimensional methods and inference on structural and treatment effects. J. Econ. Perspect. 28:229–50 [Google Scholar]
  9. Belloni A, Chernozhukov V, Kato K. 2014b. Uniform post-selection inference for least absolute deviation regression and other Z-estimation problems. Biometrika 102:177–94 [Google Scholar]
  10. Benjamini Y. 2010. Simultaneous and selective inference: current successes and future challenges. Biom. J. 52:708–21 [Google Scholar]
  11. Benjamini Y, Bogomolov M. 2014. Selective inference on multiple families of hypotheses. J. R. Stat. Soc. B 76:297–318 [Google Scholar]
  12. Benjamini Y, Heller R. 2007. False discovery rates for spatial signals. J. Am. Stat. Assoc. 102:1272–81 [Google Scholar]
  13. Benjamini Y, Heller R. 2008. Screening for partial conjunction hypotheses. Biometrics 64:1215–22 [Google Scholar]
  14. Benjamini Y, Hochberg Y. 1995. Controlling the false discovery rate: a practical and powerful approach to multiple testing. J. R. Stat. Soc. B 57:289–300 [Google Scholar]
  15. Benjamini Y, Hochberg Y. 1997. Multiple hypotheses testing with weights. Scand. J. Stat. 24:407–18 [Google Scholar]
  16. Benjamini Y, Hochberg Y. 2000. On the adaptive control of the false discovery rate in multiple testing with independent statistics. J. Educ. Behav. Stat. 25:60–83 [Google Scholar]
  17. Benjamini Y, Krieger AM, Yekutieli D. 2006. Adaptive linear step-up procedures that control the false discovery rate. Biometrika 93:491–507 [Google Scholar]
  18. Benjamini Y, Yekutieli D. 2001. The control of the false discovery rate in multiple testing under dependency. Ann. Stat. 29:1165–88 [Google Scholar]
  19. Benjamini Y, Yekutieli D. 2005. False discovery rate—adjusted multiple confidence intervals for selected parameters. J. Am. Stat. Assoc. 100:71–81 [Google Scholar]
  20. Berk R, Brown L, Buja A, Zhang K, Zhao L. 2013. Valid post-selection inference. Ann. Stat. 41:802–37 [Google Scholar]
  21. Bickel PJ, Ritov Y, Ryden T. 1998. Asymptotic normality of the maximum-likelihood estimator for general hidden Markov models. Ann. Stat. 26:1614–35 [Google Scholar]
  22. Blanchard G, Geman D. 2005. Hierarchical testing designs for pattern recognition. Ann. Stat. 33:1155–202 [Google Scholar]
  23. Brown LD, Greenshtein E. 2009. Nonparametric empirical Bayes and compound decision approaches to estimation of a high-dimensional vector of normal means. Ann. Stat. 37:1685–704 [Google Scholar]
  24. Cai TT. 2017. Global testing and large-scale multiple testing for high-dimensional covariance structures. Annu. Rev. Stat. Appl. 4:423–46 [Google Scholar]
  25. Cai TT, Guo Z. 2017. Confidence intervals for high-dimensional linear regression: minimax rates and adaptivity. Ann. Stat. 45:615–46 [Google Scholar]
  26. Cai TT, Jeng XJ, Jin J. 2011. Optimal detection of heterogeneous and heteroscedastic mixtures. J. R. Stat. Soc. B 73:629–62 [Google Scholar]
  27. Cai TT, Jin J. 2010. Optimal rates of convergence for estimating the null density and proportion of nonnull effects in large-scale multiple testing. Ann. Stat. 38:100–45 [Google Scholar]
  28. Cai TT, Jin J, Low MG. 2007. Estimation and confidence sets for sparse normal mixtures. Ann. Stat. 35:2421–49 [Google Scholar]
  29. Cai TT, Sun W. 2009. Simultaneous testing of grouped hypotheses: finding needles in multiple haystacks. J. Am. Stat. Assoc. 104:1467–81 [Google Scholar]
  30. Cai TT, Sun W. 2016. Optimal screening and discovery of sparse signals with applications to multistage high-throughput studies. J. R. Stat. Soc. B. 79:197–223 [Google Scholar]
  31. Cai TT, Wu Y. 2014. Optimal detection of sparse mixtures against a given null distribution. IEEE Trans. Inf. Theory 60:2217–32 [Google Scholar]
  32. Cao H, Sun W, Kosorok MR. 2013. The optimal power puzzle: scrutiny of the monotone likelihood ratio assumption in multiple testing. Biometrika 100:495–502 [Google Scholar]
  33. Chudik A, Kapetanios G, Pesaran MH. 2016. A one-covariate at a time, multiple testing approach to variable selection in high-dimensional linear regression models Glob. Monet. Policy Inst. Work. Pap. 290, Fed. Reserve Bank Dallas, TX. https://ssrn.com/abstract=2874165 [Google Scholar]
  34. Clarke S, Hall P. 2009. Robustness of multiple testing procedures against dependence. Ann. Stat. 37:332–58 [Google Scholar]
  35. Dmitrienko A, Wiens BL, Tamhane AC, Wang X. 2007. Tree-structured gatekeeping tests in clinical trials with hierarchically ordered multiple objectives. Stat. Med. 26:2465–78 [Google Scholar]
  36. Donoho D, Jin J. 2004. Higher criticism for detecting sparse heterogeneous mixtures. Ann. Stat. 32:962–94 [Google Scholar]
  37. Dudoit S, Shaffer JP, Boldrick JC. 2003. Multiple hypothesis testing in microarray experiments. Stat. Sci. 18:71–103 [Google Scholar]
  38. Efron B. 2004. Large-scale simultaneous hypothesis testing: the choice of a null hypothesis. J. Am. Stat. Assoc. 99:96–104 [Google Scholar]
  39. Efron B. 2007a. Correlation and large-scale simultaneous significance testing. J. Am. Stat. Assoc. 102:93–103 [Google Scholar]
  40. Efron B. 2007b. Size, power and false discovery rates. Ann. Stat. 35:1351–77 [Google Scholar]
  41. Efron B. 2008. Simultaneous inference: When should hypothesis testing problems be combined?. Ann. Appl. Stat. 2:197–223 [Google Scholar]
  42. Efron B. 2011. Tweedie's formula and selection bias. J. Am. Stat. Assoc. 106:1602–14 [Google Scholar]
  43. Efron B, Tibshirani R, Storey JD, Tusher V. 2001. Empirical Bayes analysis of a microarray experiment. J. Am. Stat. Assoc. 96:1151–60 [Google Scholar]
  44. Fan J, Han X, Gu W. 2012. Estimating false discovery proportion under arbitrary covariance dependence. J. Am. Stat. Assoc. 107:1019–35 [Google Scholar]
  45. Ferkingstad E, Frigessi A, Rue H, Thorleifsson G, Kong A. 2008. Unsupervised empirical Bayesian multiple testing with external covariates. Ann. Appl. Stat. 2:714–35 [Google Scholar]
  46. Finner H, Dickhaus T, Roters M. 2007. Dependency and false discovery rate: asymptotics. Ann. Stat. 35:1432–55 [Google Scholar]
  47. Friguet C, Kloareg M, Causeur D. 2009. A factor model approach to multiple testing under dependence. J. Am. Stat. Assoc. 104:1406–15 [Google Scholar]
  48. Fryzlewicz P. 2014. Wild binary segmentation for multiple change-point detection. Ann. Stat. 42:62243–81 [Google Scholar]
  49. Genovese CR, Lazar NA, Nichols T. 2002. Thresholding of statistical maps in functional neuroimaging using the false discovery rate. Neuroimage 15:870–78 [Google Scholar]
  50. Genovese CR, Roeder K, Wasserman L. 2006. False discovery control with p-value weighting. Biometrika 93:509–24 [Google Scholar]
  51. Genovese CR, Wasserman L. 2002. Operating characteristics and extensions of the false discovery rate procedure. J. R. Stat. Soc. B 64:499–517 [Google Scholar]
  52. Genovese CR, Wasserman L. 2004. A stochastic process approach to false discovery control. Ann. Stat. 32:1035–61 [Google Scholar]
  53. Genovese CR, Wasserman L. 2006. Exceedance control of the false discovery proportion. J. Am. Stat. Assoc. 101:1408–17 [Google Scholar]
  54. Goeman JJ, Mansmann U. 2008. Multiple testing on the directed acyclic graph of gene ontology. Bioinformatics 24:537–44 [Google Scholar]
  55. Goeman JJ, Solari A. 2010. The sequential rejection principle of familywise error control. Ann. Stat. 38:3782–810 [Google Scholar]
  56. Goeman JJ, Solari A, Stijnen T. 2010. Three-sided hypothesis testing: simultaneous testing of superiority, equivalence and inferiority. Stat. Med. 29:2117–25 [Google Scholar]
  57. Green PJ, Richardson S. 2002. Hidden Markov models and disease mapping. J. Am. Stat. Assoc. 97:1055–70 [Google Scholar]
  58. Guo W, Sarkar SK, Peddada SD. 2010. Controlling false discoveries in multidimensional directional decisions, with applications to gene expression data on ordered categories. Biometrics 66:485–92 [Google Scholar]
  59. Hall P, Jin J. 2010. Innovated higher criticism for detecting sparse signals in correlated noise. Ann. Stat. 38:1686–732 [Google Scholar]
  60. Harvey CR, Liu Y. 2015. Backtesting. J. Portf. Manag. 42:13–28 [Google Scholar]
  61. Heller R, Stanley D, Yekutieli D, Rubin N, Benjamini Y. 2006. Cluster-based analysis of fMRI data. Neuroimage 33:599–608 [Google Scholar]
  62. Hochberg Y. 1988. A sharper Bonferroni procedure for multiple tests of significance. Biometrika 75:800–2 [Google Scholar]
  63. Hochberg Y, Tamhane AC. 2009. Multiple Comparison Procedures Hoboken, NJ: Wiley [Google Scholar]
  64. Holm S. 1979. A simple sequentially rejective multiple test procedure. Scand. J. Stat. 6:65–70 [Google Scholar]
  65. Hommel G. 1988. A stagewise rejective multiple test procedure based on a modified Bonferroni test. Biometrika 75:383–86 [Google Scholar]
  66. Hu JX, Zhao H, Zhou HH. 2012. False discovery rate control with groups. J. Am. Stat. Assoc. 105:1215–27 [Google Scholar]
  67. Hwang JG, Zhao Z. 2013. Empirical Bayes confidence intervals for selected parameters in high-dimensional data. J. Am. Stat. Assoc. 108:607–18 [Google Scholar]
  68. Ingster YI. 1998. Minimax detection of a signal for ln-balls. Math. Methods Stat. 7:401–28 [Google Scholar]
  69. Jager L, Wellner JA. 2007. Goodness-of-fit tests via phi-divergences. Ann. Stat. 35:2018–53 [Google Scholar]
  70. Javanmard A, Montanari A. 2014. Confidence intervals and hypothesis testing for high-dimensional regression. J. Mach. Learn. Res. 15:2869–909 [Google Scholar]
  71. Jin J. 2008. Proportion of non-zero normal means: universal oracle equivalences and uniformly consistent estimators. J. R. Stat. Soc. B 70:461–93 [Google Scholar]
  72. Jin J, Cai TT. 2007. Estimating the null and the proportional of nonnull effects in large-scale multiple comparisons. J. Am. Stat. Assoc. 102:495–506 [Google Scholar]
  73. Langaas M, Lindqvist BH, Ferkingstad E. 2005. Estimating the proportion of true null hypotheses, with application to DNA microarray data. J. R. Stat. Soc. B 67:555–72 [Google Scholar]
  74. Lee JD, Sun DL, Sun Y, Taylor JE. 2016. Exact post-selection inference, with application to the lasso. Ann. Stat. 44:907–27 [Google Scholar]
  75. Leeb H, Pötscher BM. 2005. Model selection and inference: facts and fiction. Econom. Theory 21:121–59 [Google Scholar]
  76. Leek JT, Storey JD. 2008. A general framework for multiple testing dependence. PNAS 105:18718–23 [Google Scholar]
  77. Lehmann EL, Romano JP. 2005a. Generalizations of the familywise error rate. Ann. Stat. 33:1138–54 [Google Scholar]
  78. Lehmann EL, Romano JP. 2005b. Testing Statistical Hypotheses New York: Springer, 3rd ed.. [Google Scholar]
  79. Leroux BG. 1992. Maximum-likelihood estimation for hidden Markov models. Stoch. Process. Appl. 40:127–43 [Google Scholar]
  80. Liang K, Nettleton D. 2010. A hidden Markov model approach to testing multiple hypotheses on a tree-transformed gene ontology graph. J. Am. Stat. Assoc. 105:1444–54 [Google Scholar]
  81. Lin DY. 2006. Evaluating statistical significance in two-stage genomewide association studies. Am. J. Hum. Genet. 78:505–9 [Google Scholar]
  82. Liu W. 2013. Gaussian graphical model estimation with false discovery rate control. Ann. Stat. 41:62948–78 [Google Scholar]
  83. Liu W, Luo S. 2014. Hypothesis testing for high-dimensional regression models Tech. Rep., Shanghai Jiao Tong Univ. Shanghai: [Google Scholar]
  84. Lo AW, MacKinlay AC. 1990. Data-snooping biases in tests of financial asset pricing models. Rev. Financ. Stud. 3:431–67 [Google Scholar]
  85. Lockhart R, Taylor J, Tibshirani RJ, Tibshirani R. 2014. A significance test for the lasso. Ann. Stat. 42:413–68 [Google Scholar]
  86. Lumsdaine RL, Papell DH. 1997. Multiple trend breaks and the unit-root hypothesis. Rev. Econ. Stat. 79:2212–18 [Google Scholar]
  87. Meinshausen N. 2008. Hierarchical testing of variable importance. Biometrika 95:265–78 [Google Scholar]
  88. Meinshausen N, Rice J. 2006. Estimating the proportion of false null hypotheses among a large number of independently tested hypotheses. Ann. Stat. 34:373–93 [Google Scholar]
  89. Newton MA, Noueiry A, Sarkar D, Ahlquist P. 2004. Detecting differential gene expression with a semiparametric hierarchical mixture method. Biostatistics 5:155–76 [Google Scholar]
  90. Owen AB. 2005. Variance of the number of false discoveries. J. R. Stat. Soc. B 67:411–26 [Google Scholar]
  91. Pacifico M, Genovese C, Verdinelli I, Wasserman L. 2004. False discovery control for random fields. J. Am. Stat. Assoc. 99:1002–14 [Google Scholar]
  92. Posch M, Zehetmayer S, Bauer P. 2009. Hunting for significance with the false discovery rate. J. Am. Stat. Assoc. 104:832–40 [Google Scholar]
  93. Rabiner LR. 1989. A tutorial on hidden Markov models and selected applications in speech recognition. Proc. IEEE 77:257–86 [Google Scholar]
  94. Robbins H. 1951. Asymptotically subminimax solutions of compound statistical decision problems. Proc. Berkeley Symp. Math. Stat. Probab., 2nd, Berkeley131–48 Berkeley, CA: Univ. Calif. Press [Google Scholar]
  95. Roeder K, Wasserman L. 2009. Genome-wide significance levels and weighted hypothesis testing. Stat. Sci. 24:398–413 [Google Scholar]
  96. Rogosa D. 2003. Accuracy of API index and school base report elements: 2003 Academic Performance Index, California Department of Education Work. Pap., Stanford Univ Stanford, CA: [Google Scholar]
  97. Romano JP, Shaikh AM. 2006. Stepup procedures for control of generalizations of the familywise error rate. Ann. Stat. 34:1850–73 [Google Scholar]
  98. Roquain E, Van De Wiel MA. 2009. Optimal weighting for false discovery rate control. Electron. J. Stat. 3:678–711 [Google Scholar]
  99. Roquain E, Villers F. 2011. Exact calculations for false discovery proportion with application to least favorable configurations. Ann. Stat. 39:584–612 [Google Scholar]
  100. Sarkar SK. 2002. Some results on false discovery rate in stepwise multiple testing procedures. Ann. Stat. 30:239–57 [Google Scholar]
  101. Sarkar SK. 2004. FDR-controlling stepwise procedures and their false negatives rates. J. Stat. Plan. Inference 125:119–37 [Google Scholar]
  102. Sarkar SK. 2007. Stepup procedures controlling generalized FWER and generalized FDR. Ann. Stat. 35:2405–20 [Google Scholar]
  103. Sarkar SK, Chen J, Guo W. 2013. Controlling the false discovery rate in two-stage combination tests for multiple endpoints. J. Am. Stat. Assoc. 108:1385–401 [Google Scholar]
  104. Schwartzman A, Dougherty RF, Taylor JE. 2008. False discovery rate analysis of brain diffusion direction maps. Ann. Appl. Stat. 2:153–75 [Google Scholar]
  105. Schwartzman A, Lin X. 2011. The effect of correlation in false discovery rate estimation. Biometrika 98:199–214 [Google Scholar]
  106. Schweder T, Spjøtvoll E. 1982. Plots of p-values to evaluate many tests simultaneously. Biometrika 69:493–502 [Google Scholar]
  107. Shaffer JP. 1995. Multiple hypothesis testing. Annu. Rev. Psychol. 46:561–84 [Google Scholar]
  108. Shorack GR, Wellner JA. 2009. Empirical Processes with Applications to Statistics Philadelphia: SIAM [Google Scholar]
  109. Silverman BW. 1986. Density Estimation for Statistics and Data Analysis Boca Raton:, FL: CRC Press [Google Scholar]
  110. Spjøtvoll E. 1972. On the optimality of some multiple comparison procedures. Ann. Math. Stat. 43:398–411 [Google Scholar]
  111. Stock JH, Watson MW. 2012. Generalized shrinkage methods for forecasting using many predictors. J. Bus. Econ. Stat. 30:4481–93 [Google Scholar]
  112. Storey JD. 2002. A direct approach to false discovery rates. J. R. Stat. Soc. B 64:479–98 [Google Scholar]
  113. Storey JD. 2003. The positive false discovery rate: a Bayesian interpretation and the q-value. Ann. Stat. 31:2013–35 [Google Scholar]
  114. Storey JD. 2007. The optimal discovery procedure: a new approach to simultaneous significance testing. J. R. Stat. Soc. B 69:347–68 [Google Scholar]
  115. Storey JD, Taylor JE, Siegmund D. 2004. Strong control, conservative point estimation and simultaneous conservative consistency of false discovery rates: a unified approach. J. R. Stat. Soc. B 66:187–205 [Google Scholar]
  116. Storey JD, Tibshirani R. 2003. Statistical significance for genomewide studies. PNAS 100:9440–45 [Google Scholar]
  117. Stoye J. 2009. More on confidence intervals for partially identified parameters. Econometrica 77:41299–315 [Google Scholar]
  118. Sun W, Cai TT. 2007. Oracle and adaptive compound decision rules for false discovery rate control. J. Am. Stat. Assoc. 102:901–12 [Google Scholar]
  119. Sun W, Cai TT. 2009. Large-scale multiple testing under dependence. J. R. Stat. Soc. B 71:393–424 [Google Scholar]
  120. Sun W, McLain A. 2012. Multiple testing of composite null hypotheses in heteroscedastic models. J. Am. Stat. Assoc. 107:673–87 [Google Scholar]
  121. Sun W, Reich BJ, Cai TT, Guindani M, Schwartzman A. 2015. False discovery control in large-scale spatial multiple testing. J. R. Stat. Soc. B 77:59–83 [Google Scholar]
  122. Sun W, Wei Z. 2011. Large-scale multiple testing for pattern identification, with applications to time-course microarray experiments. J. Am. Stat. Assoc. 106:73–88 [Google Scholar]
  123. Sun W, Wei Z. 2015. Hierarchical recognition of sparse patterns in large-scale simultaneous inference. Biometrika 32:1823–31 [Google Scholar]
  124. Taylor J, Tibshirani RJ. 2015. Statistical learning and selective inference. PNAS 112:7629–34 [Google Scholar]
  125. Taylor J, Tibshirani RJ, Efron B. 2005. The `miss rate’ for the analysis of gene expression data. Biostatistics 6:111–17 [Google Scholar]
  126. Tusher VG, Tibshirani R, Chu G. 2001. Significance analysis of microarrays applied to the ionizing radiation response. PNAS 98:5116–21 [Google Scholar]
  127. Van de Geer S, Bühlmann P, Ritov Y, Dezeure R. 2014. On asymptotically optimal confidence regions and tests for high-dimensional models. Ann. Stat. 42:1166–202 [Google Scholar]
  128. van der Laan MJ, Dudoit S, Pollard KS. 2004. Augmentation procedures for control of the generalized family-wise error rate and tail probabilities for the proportion of false positives. Stat. Appl. Genet. Mol. Biol. 3:1–25 [Google Scholar]
  129. Wei Z, Li H. 2007. A Markov random field model for network-based analysis of genomic data. Bioinformatics 23:1537–44 [Google Scholar]
  130. Westfall PH, Young SS. 1993. Resampling-Based Multiple Testing: Examples and Methods for p-Value Adjustment Hoboken, NJ: Wiley [Google Scholar]
  131. White H. 2000. A reality check for data snooping. Econometrica 68:1097–126 [Google Scholar]
  132. Wu WB. 2008. On false discovery control under dependence. Ann. Stat. 36:364–80 [Google Scholar]
  133. Wu Z, Zhou HH. 2013. Model selection and sharp asymptotic minimaxity. Probab. Theory Relat. Fields 156:1–2165–91 [Google Scholar]
  134. Xia Y, Cai T, Cai TT. 2017. Two-sample tests for high-dimensional linear regression with an application to detecting interactions. Stat. Sin. In press [Google Scholar]
  135. Yekutieli D. 2008. Hierarchical false discovery rate–controlling methodology. J. Am. Stat. Assoc. 103:309–16 [Google Scholar]
  136. Yekutieli D. 2012. Adjusted Bayesian inference for selected parameters. J. R. Stat. Soc. B 74:515–41 [Google Scholar]
  137. Zhang C-H, Zhang SS. 2014. Confidence intervals for low dimensional parameters in high dimensional linear models. J. R. Stat. Soc. B 76:217–42 [Google Scholar]

Data & Media loading...

  • Article Type: Review Article
This is a required field
Please enter a valid email address
Approval was a Success
Invalid data
An Error Occurred
Approval was partially successful, following selected items could not be processed due to error