1932

Abstract

In this article, we provide a personal review of the literature on nonparametric and robust tools in the standard univariate and multivariate location and scatter, as well as linear regression problems, with a special focus on sign and rank methods, their equivariance and invariance properties, and their robustness and efficiency. Beyond parametric models, the population quantities of interest are often formulated as location, scatter, skewness, kurtosis and other functionals. Some old and recent tools for model checking, dimension reduction, and subspace estimation in wide semiparametric models are discussed. We also discuss recent extensions of procedures in certain nonstandard semiparametric cases including clustered and matrix-valued data. Our personal list of important unsolved and future issues is provided.

Loading

Article metrics loading...

/content/journals/10.1146/annurev-statistics-031017-100247
2018-03-07
2024-04-26
Loading full text...

Full text loading...

/deliver/fulltext/statistics/5/1/annurev-statistics-031017-100247.html?itemId=/content/journals/10.1146/annurev-statistics-031017-100247&mimeType=html&fmt=ahah

Literature Cited

  1. Anderson TW. 2003. An Introduction to Multivariate Statistical Analysis New York: Wiley, 3rd ed..
  2. Bates D, Mächler M, Bolker B, Walker S. 2015. Fitting linear mixed-effects models using lme4. J. Stat. Softw. 67:1–48 [Google Scholar]
  3. Cardoso JF. 1989. Source separation using higher order moments. Proc. IEEE Int. Conf. Accoust. Speech Signal Process.2109–112 New York: IEEE [Google Scholar]
  4. Cardoso JF, Souloumiac A. 1993. Blind beamforming for non-Gaussian signals. IEEE Proc. F 140:362–70 [Google Scholar]
  5. Chakraborty B, Chaudhuri P. 1998. On an adaptive transformation and retransformation estimate of multivariate location. J. R. Stat. Soc. B 60:145–57 [Google Scholar]
  6. Chaudhuri P. 1996. On a geometric notion of quantiles for multivariate data. J. Am. Stat. Assoc. 91:862–68 [Google Scholar]
  7. Chernozhukov V, Galichon A, Hallin M, Henry M. 2017. Monge-Kantorovich depth, quantiles, ranks and signs. Ann. Stat. 45:223–56 [Google Scholar]
  8. Comon P, Jutten C. 2010. Handbook of Blind Source Separation: Independent Component Analysis and Applications Oxford, UK: Academic
  9. Croux C, Haesbrock G. 2000. Principal component analysis based on robust estimators of the covariance and correlation matrix: influence functions and efficiencies. Biometrika 87:603–18 [Google Scholar]
  10. Croux C, Ollila E, Oja H. 2002. Sign and rank covariance matrices: statistical properties and application to principal component analysis. Statistical Data Analysis Based on the L1-Norm and Related Methods Y Dodge 257–69 Basel, Switz.: Birkäuser [Google Scholar]
  11. Dümbgen L. 1998. On Tyler's M-functional of scatter in high dimension. Ann. Inst. Stat. Math. 50:471–91 [Google Scholar]
  12. Dümbgen L, Nordhausen K, Schuhmacher H. 2014. fastM: fast computation of multivariate M-estimators. https://CRAN.R-project.org/package=fastM
  13. Dümbgen L, Nordhausen K, Schuhmacher H. 2016. New algorithms for M-estimation of multivariate scatter and location. J. Multivar. Anal. 144:200–17 [Google Scholar]
  14. Dümbgen L, Pauly M, Schweizer T. 2015. M-functionals of multivariate scatter. Stat. Surv. 9:32–105 [Google Scholar]
  15. Dutta S, Datta S. 2016. ClusterRankTest: rank tests for clustered data. https://CRAN.R-project.org/package=ClusterRankTest
  16. Fischer D, Mosler K, Möttönen J, Nordhausen K, Pokotylo O, Vogel D. 2016. Computing the Oja median in R: the package OjaNP. arXiv:1606.07620 [stat.CO].
  17. Fischer D, Oja H. 2015. Mann-Whitney type tests for microarray experiments: the R package gMWT. J. Stat. Softw. 65:1–19 [Google Scholar]
  18. Geraci M. 2014. Linear quantile mixed models: the lqmm package for Laplace quantile regression. J. Stat. Softw. 57:1–29 [Google Scholar]
  19. Gupta AK, Nagar DK. 2010. Matrix Variate Distributions Boca Raton, FL: Chapman and Hall/CRC
  20. Hajek Z, Sidak J. 1967. Theory of Rank Tests New York: Academic
  21. Hallin M, Mehta C. 2015. R-estimation for asymmetric independent component analysis. J. Am. Stat. Assoc. 110:218–32 [Google Scholar]
  22. Hallin M, Paindaveine D. 2002. Optimal tests for multivariate location based on interdirections and pseudo-Mahalanobis ranks. Ann. Stat. 30:1103–33 [Google Scholar]
  23. Hampel FR, Ronchetti EM, Rousseeuw PJ, Stahel WA. 1986. Robust Statistics: The Approach Based on Influence Functions. New York: Wiley
  24. Hettmansperger TP, Aubuchon JC. 1988. Comment on “Rank-based robust analysis of linear models. I. Exposition and review” by David Draper. Stat. Sci. 3:262–63 [Google Scholar]
  25. Hettmansperger TP, McKean JW. 2011. Robust Nonparametric Statistical Methods Boca Raton, FL: Chapman and Hall/CRC. , 2nd ed..
  26. Hettmansperger TP, Randles RH. 2002. A practical affine equivariant multivariate median. Biometrika 89:851–60 [Google Scholar]
  27. Hosking JRM. 1990. L-moments: analysis and estimation of distributions using linear combinations of order statistics. J. R. Stat. Soc. B 52:105–24 [Google Scholar]
  28. Hothorn T, Hornik K. 2017. exactRankTests: exact distributions for rank and permutation tests. https://CRAN.R-project.org/package=exactRankTests
  29. Huber PJ. 1980. Robust Statistics New York: Wiley
  30. Huber PJ. 1985. Projection pursuit. Ann. Stat. 13:435–75 [Google Scholar]
  31. Hyvärinen A, Oja E. 1997. A fast fixed-point algorithm for independent component analysis. Neural Comput 9:1483–92 [Google Scholar]
  32. Ilmonen P, Oja H, Serfling R. 2012. On invariant coordinate system (ICS) functionals. Int. Stat. Rev. 80:93–110 [Google Scholar]
  33. Ilmonen P, Paindaveine D. 2011. Semiparametrically efficient inference based on signed ranks in symmetric independent components models. Ann. Stat. 39:2448–76 [Google Scholar]
  34. Jolliffe IT. 2002. Principal Component Analysis New York: Springer. , 2nd ed..
  35. Jurečková J, Sen PK, Picek J. 2013. Methodology in Robust and Nonparametric Statistics Boca Raton, FL: Chapman and Hall/CRC
  36. Kankainen A, Taskinen S, Oja H. 2007. Tests of multinormality based on location vectors and scatter matrices. Stat. Method Appl. 16:357–79 [Google Scholar]
  37. Kloke JD, McKean JW. 2012. Rfit: rank-based estimation for linear models. R J 4:57–64 [Google Scholar]
  38. Kloke JD, McKean JW, Rashid MM. 2009. Rank-based estimation and associated inferences for linear models with cluster correlated errors. J. Am. Stat. Assoc. 104:384–90 [Google Scholar]
  39. Koenker R. 2016. quantreg: quantile regression. https://CRAN.R-project.org/package=quantreg
  40. Koller M. 2016. robustlmm: an R package for robust estimation of linear mixed-effects models. J. Stat. Softw. 75:1–24 [Google Scholar]
  41. Konietschke F, Brunner E. 2009. Nonparametric analysis of clustered data in diagnostic trials: estimation problems in small sample sizes. Comput. Stat. Data. Anal. 53:730–41 [Google Scholar]
  42. Koshevoy G, Möttönen J, Oja H. 2004. On the geometry of multivariate L1 objective functions. Allg. Stat. Arch. 88:137–54 [Google Scholar]
  43. Larocque D. 2017. mvctm: multivariate variance components tests for multilevel data. https://CRAN.R-project.org/package=mvctm
  44. Lehmann EL. 1975. Nonparametrics: Statistical Methods Based on Ranks San Francisco: Holden-Day
  45. Liu R. 1990. On a notion of data depth based upon random simplices. Ann. Stat. 18:405–14 [Google Scholar]
  46. Locantore N, Marron JS, Simpson DG, Tripoli N, Zhang JT, Cohen KL. 1999. Robust principal component analysis for functional data. Test 8:1–73 [Google Scholar]
  47. Mächler M, Rousseeuw P, Croux C, Valentin Todorov AR, Salibian-Barrera M. et al. 2016. robustbase: basic robust statistics. https://CRAN.R-project.org/package=robustbase
  48. Mardia KV. 1970. Measures of multivariate skewness and kurtosis with applications. Biometrika 57:519–30 [Google Scholar]
  49. Mardia KV, Kent JT, Bibby JM. 1979. Multivariate Analysis New York: Academic
  50. Maronna RA, Martin RD, Yohai VJ. 2006. Robust Statistics—Theory and Methods Chichester, UK: Wiley
  51. Miettinen J, Nordhausen K, Oja H, Taskinen S. 2015.a fICA: classical, reloaded and adaptive FastICA algorithms. https://CRAN.R-project.org/package=fICA
  52. Miettinen J, Nordhausen K, Oja H, Taskinen S, Virta J. 2017.a The squared symmetric FastICA estimator. Signal Process 131:402–11 [Google Scholar]
  53. Miettinen J, Nordhausen K, Taskinen S. 2017.b Blind source separation based on joint diagonalization in R: The packages JADE and BSSasymp. J. Stat. Softw. 76:1–31 [Google Scholar]
  54. Miettinen J, Taskinen S, Nordhausen K, Oja H. 2015.b Fourth moments and independent component analysis. Stat. Sci. 30:372–90 [Google Scholar]
  55. Nevalainen J, Datta S, Oja H. 2014. Inference on the marginal distribution of clustered data with informative cluster size. Stat. Pap. 55:71–92 [Google Scholar]
  56. Nevalainen J, Larocque D, Oja H, Pörsti I. 2010. Nonparametric analysis of clustered multivariate data. J. Am. Stat. Assoc. 105:864–72 [Google Scholar]
  57. Nevalainen J, Oja H, Datta S. 2017. Tests for informative cluster size using a novel balanced bootstrap scheme. Stat. Med. 36:2630–40 [Google Scholar]
  58. Nordhausen K, Oja H. 2011. Multivariate L1 methods: the package MNM. J. Stat. Softw. 43:1–28 [Google Scholar]
  59. Nordhausen K, Oja H, Ollila E. 2008.a Robust independent component analysis based on two scatter matrices. Austrian J. Stat. 37:91–100 [Google Scholar]
  60. Nordhausen K, Oja H, Ollila E. 2011. Multivariate models and the first four moments. Nonparametric Statistics and Mixture Models DR Hunter, DSR Richards, JL Rosenberger 267–87 Singapore: World Scientific [Google Scholar]
  61. Nordhausen K, Oja H, Paindaveine D. 2009. Signed-rank tests for location in the symmetric independent component model. J. Multivar. Anal. 100:821–34 [Google Scholar]
  62. Nordhausen K, Oja H, Tyler DE. 2006. On the efficiency of invariant multivariate sign and rank test. Festschrift for Tarmo Pukkila on His 60th Birthday EP Liski, J Isotalo, J Niemelä, S Puntanen, GPH Styan 217–31 Tampere, Finl.: Univ. Tampere [Google Scholar]
  63. Nordhausen K, Oja H, Tyler DE. 2008.b Tools for exploring multivariate data: the package ICS. J. Stat. Softw. 28:1–31 [Google Scholar]
  64. Nordhausen K, Oja H, Tyler DE. 2016.a Asymptotic and bootstrap tests for subspace dimension. arXiv1611.04908 [stat.ME]
  65. Nordhausen K, Oja H, Tyler DE, Virta J. 2016.b ICtest: estimating and testing the number of interesting components in linear dimension reduction. https://CRAN.R-project.org/package=ICtest
  66. Nordhausen K, Oja H, Tyler DE, Virta J. 2017. Asymptotic and bootstrap tests for the dimension of the non-Gaussian subspace. Signal Process. Lett. 24:887–91 [Google Scholar]
  67. Nordhausen K, Sirkiä S, Oja H, Tyler DE. 2015. ICSNP: tools for multivariate nonparametrics. https://CRAN.R-project.org/package=ICSNP
  68. Nordhausen K, Tyler DE. 2015. A cautionary note on robust covariance plug-in methods. Biometrika 102:573–88 [Google Scholar]
  69. Oja H. 1981. On location, scale, skewness and kurtosis of univariate distributions. Scand. J. Stat. 8:154–68 [Google Scholar]
  70. Oja H. 1999. Affine invariant multivariate sign and rank tests and corresponding estimates: a review. Scand. J. Stat. 26:319–43 [Google Scholar]
  71. Oja H. 2010. Multivariate Nonparametric Methods with R: An Approach Based on Spatial Signs and Ranks New York: Springer
  72. Oja H, Paindaveine D. 2005. Optimal signed-rank tests based on hyperplanes. J. Stat. Plan. Inference 135:300–23 [Google Scholar]
  73. Oja H, Sirkiä S, Eriksson J. 2006. Scatter matrices and independent component analysis. Austrian J. Stat. 35:175–89 [Google Scholar]
  74. Ollila E, Croux C, Oja H. 2004. Influence function and asymptotic efficiency of the affine equivariant rank covariance matrix. Stat. Sin. 14:297–16 [Google Scholar]
  75. Ollila E, Oja H, Croux C. 2003.a The affine equivariant sign covariance matrix: asymptotic behavior and efficiencies. J. Multivar. Anal. 87:328–55 [Google Scholar]
  76. Ollila E, Oja H, Koivunen V. 2003.b Estimates of regression coefficients based on lift rank covariance matrix. J. Am. Stat. Assoc. 98:90–98 [Google Scholar]
  77. Pearson K. 1895. Contributions to the mathematical theory of evolution, ii: Skew variation in homogeneous material. Philos. Trans. R. Soc. 186:343–14 [Google Scholar]
  78. Puri ML, Sen PK. 1971. Nonparametric Methods in Multivariate Analysis New York: Wiley
  79. R Core Team. 2016. R: a language and environment for statistical computing R Foundation for Statistical Computing Vienna: R Found. Stat. Comput.
  80. Randles RH. 1989. A distribution-free multivariate sign test based on interdirections. J. Am. Stat. Assoc. 84:1045–50 [Google Scholar]
  81. Ros B, Bijma F, de Munck JC, de Gunst MC. 2016. Existence and uniqueness of the maximum likelihood estimator for models with a Kronecker product covariance structure. J. Multivar. Anal. 143:345–61 [Google Scholar]
  82. Santambrogio F. 2015. Optimal Transport for Applied Mathematicians Basel, Switz.: Birkhäuser
  83. Serfling RJ. 1980. Approximation Theorems of Mathematical Statistics New York: Wiley
  84. Serfling RJ. 2004. Nonparametric multivariate descriptive measures based on spatial quantiles. J. Stat. Plann. Inference 123:259–78 [Google Scholar]
  85. Srivastava MS, von Rosen T, von Rosen D. 2008. Models with a Kronecker product covariance structure: estimation and testing. Math. Methods Stat. 17:357–70 [Google Scholar]
  86. Sun Y, Babu P, Palomar DP. 2016. Robust estimation of structured covariance matrix for heavy-tailed elliptical distributions. IEEE Trans. Signal Process. 64:3576–90 [Google Scholar]
  87. Taskinen S, Koch I, Oja H. 2012. Robustifying principal component analysis with spatial sign vectors. Stat. Probab. Lett. 82:765–74 [Google Scholar]
  88. Tukey J. 1975. Mathematics and the picturing of data. Proc. Int. Congr. Math. 2:523–31 [Google Scholar]
  89. Tyler DE. 1983. The asymptotic distribution of principal component roots under local alternatives to multiple roots. Ann. Stat. 11:1232–42 [Google Scholar]
  90. Tyler DE. 1987. A distribution-free M-estimator of multivariate scatter. Ann. Stat. 15:234–51 [Google Scholar]
  91. Tyler DE. 2010. A note on multivariate location and scatter statistics for sparse data sets. Stat. Probab. Lett. 80:1409–13 [Google Scholar]
  92. Tyler DE, Critchley F, Dümbgen L, Oja H. 2009. Invariant coordinate selection. J. R. Stat. Soc. B 71:549–92 [Google Scholar]
  93. Venables WN, Ripley BD. 2002. Modern Applied Statistics with S New York: Springer. , 4th. ed.
  94. Virta J, Li B, Nordhausen K, Oja H. 2017.a Independent component analysis for tensor-valued data. J. Multivar. Anal. 162:172–92 [Google Scholar]
  95. Virta J, Li B, Nordhausen K, Oja H. 2017.b JADE for tensor-valued observations. J. Comput. Gr. Stat. In press. https://doi.org/10.1080/10618600.2017.1407324 [Crossref]
  96. Virta J, Nordhausen K. 2017. Blind source separation of tensor-valued time series. Signal Process 141:204–16 [Google Scholar]
  97. Virta J, Nordhausen K, Oja H. 2016.a Projection pursuit for non-Gaussian independent components. arXiv:1612.05445 [math.ST]
  98. Virta J, Taskinen S, Nordhausen K. 2016.b Applying fully tensorial ICA to fMRI data. 2016 IEEE Signal Proc. Med. Biol. Symp. (SPMB)1–6 New York: IEEE [Google Scholar]
  99. Visuri S, Koivunen V, Oja H. 2000. Sign and rank covariance matrices. J. Stat. Plann. Inference 91:557–75 [Google Scholar]
  100. Wiesel A. 2012. Geodesic convexity and covariance estimation. IEEE Trans. Signal Process. 60:6182–89 [Google Scholar]
/content/journals/10.1146/annurev-statistics-031017-100247
Loading
/content/journals/10.1146/annurev-statistics-031017-100247
Loading

Data & Media loading...

  • Article Type: Review Article
This is a required field
Please enter a valid email address
Approval was a Success
Invalid data
An Error Occurred
Approval was partially successful, following selected items could not be processed due to error