Robust Small Area Estimation: An Overview

Jiming Jiang; J. Sunil Rao

doi:10.1146/annurev-statistics-031219-041212

Annual Review of Statistics and Its Application

Volume 7, 2020

Review Article

Free

Robust Small Area Estimation: An Overview

Jiming Jiang¹, and J. Sunil Rao²
View Affiliations Hide Affiliations

Affiliations: ¹Department of Statistics, University of California, Davis, California 95616, USA; email: [email protected] ²Department of Public Health Sciences, University of Miami, Miami, Florida 33136, USA
Vol. 7:337-360 (Volume publication date March 2020) https://doi.org/10.1146/annurev-statistics-031219-041212
Copyright © 2020 by Annual Reviews. All rights reserved

Abstract

A small area typically refers to a subpopulation or domain of interest for which a reliable direct estimate, based only on the domain-specific sample, cannot be produced due to small sample size in the domain. While traditional small area methods and models are widely used nowadays, there have also been much work and interest in robust statistical inference for small area estimation (SAE). We survey this work and provide a comprehensive review here. We begin with a brief review of the traditional SAE methods. We then discuss SAE methods that are developed under weaker assumptions and SAE methods that are robust in certain ways, such as in terms of outliers or model failure. Our discussion also includes topics such as nonparametric SAE methods, Bayesian approaches, model selection and diagnostics, and missing data. A brief review of software packages available for implementing robust SAE methods is also given.

Keyword(s): borrowing strength, empirical best linear unbiased prediction, empirical best prediction, mean squared prediction error, method of moments, model failure, model misspecification, model selection, nonparametric, observed best prediction, outliers, robustness, small area estimation

Article metrics loading...

/content/journals/10.1146/annurev-statistics-031219-041212

2020-03-07

2024-05-05

Full text loading...

/deliver/fulltext/statistics/7/1/annurev-statistics-031219-041212.html?itemId=/content/journals/10.1146/annurev-statistics-031219-041212&mimeType=html&fmt=ahah

Literature Cited

Bandyopadhyay R 2017. Benchmarking the observed best predictor PhD Diss., Univ. Calif. Davis:
Battese GE, Harter RM, Fuller WA 1988. An error-components model for prediction of county crop areas using survey and satellite data. J. Am. Stat. Assoc. 80:28–36
[Google Scholar]
Bell WR, Huang ET 2006. Using the t-distribution to deal with outliers in small area estimation. Proceedings of Statistics Canada Symposium on Methodological Issues in Measuring Population Health Ottawa: Stat. Can.
[Google Scholar]
Bilton PA 2016. Tree-based models for poverty estimation PhD Thesis, Massey Univ. Palmerston North, N.Z.:
Brieman L 2001. Random forests. Mach. Learn. 45:5–32
[Google Scholar]
Calvin JA, Sedransk J 1991. Bayesian and frequentist predictive inference for the patterns of care studies. J. Am. Stat. Assoc. 86:36–48
[Google Scholar]
Carpenter JR, Kenward MG 2013. Multiple Imputation and Its Application New York: Wiley
Carroll RJ, Hall P 1988. Optimal rates of convergence for deconvolving a density. J. Am. Stat. Assoc. 83:1184–86
[Google Scholar]
Chakraborty A, Datta GK, Mandal A 2018. Robust hierarchical Bayes small area estimation for nested error regression model. arXiv:1702.05832v2 [stat.ME]
[Google Scholar]
Chambers R, Tzavidis N 2006. M-quantile models for small area estimation. Biometrika 93:255–68
[Google Scholar]
Chambers RL 1986. Outlier robust finite population estimation. J. Am. Stat. Assoc. 81:1063–69
[Google Scholar]
Chen S, Jiang J, Nguyen T 2015. Observed best prediction for small area counts. J. Surv. Stat. Methodol. 3:136–61
[Google Scholar]
Claeskens G, Hart JD 2009. Goodness-of-fit tests in mixed models (with discussion). TEST 18:213–39
[Google Scholar]
Dao C, Jiang J 2016. A modified Pearson's χ² test with application to generalized linear mixed model diagnostics. Ann. Math. Sci. Appl. 1:195–215
[Google Scholar]
Datta GS 2009. Model-based approach to small area estimation. Handbook of Statistics, Vol. 29B, Sample Surveys: Inference and Analysis D Pfeffermann, CR Rao251–88 Amsterdam: North-Holland
[Google Scholar]
Datta GS, Delaigle A, Hall P, Wang Li 2018. Semi-parametric prediction intervals in small areas when auxiliary data are measured with error. Stat. Sin. 28:2309–36
[Google Scholar]
Datta GS, Ghosh M 1991. Bayesian prediction in linear models: applications to small area estimation. Ann. Stat. 19:1748–70
[Google Scholar]
Datta GS, Hall P, Mandal A 2011. Model selection by testing for the presence of small-area effects, and applications to area-level data. J. Am. Stat. Assoc. 106:361–74
[Google Scholar]
Datta GS, Lahiri P 1995. Robust hierarchical Bayes estimation of small area characteristics in the presence of covariates. J. Multivar. Anal. 54:310–28
[Google Scholar]
Datta GS, Lahiri P 2001. Discussions on a paper by Efron & Gous. Model Selection P Lahiri249–54 Beachwood, OH: Inst. Math. Stat.
[Google Scholar]
Datta GS, Rao JNK, Smith DD 2005. On measuring the variability of small area estimators under a basic area level model. Biometrika 92:183–96
[Google Scholar]
Dempster AP, Laird NM, Rubin DB 1977. Maximum likelihood from incomplete data via the EM algorithm. J. R. Stat. Soc. B 39:1–38
[Google Scholar]
Dempster AP, Ryan LM 1985. Weighted normal plots. J. Am. Stat. Assoc. 80:845–50
[Google Scholar]
Diallo MS, Rao JNK 2018. Small area estimation for complex parameters under unit-level model with skew-normal error. Scand. J. Stat. 45:1092–1116
[Google Scholar]
Efron B 1979. Bootstrap method: another look at the jackknife. Ann. Stat. 7:1–26
[Google Scholar]
Fay RE, Herriot RA 1979. Estimates of income for small places: an application of James-Stein procedures to census data. J. Am. Stat. Assoc. 74:269–77
[Google Scholar]
Fisher RA 1922. On the interpretation of chi-square from contingency tables, and the calculation of P. J. R. Stat. Soc. 85:87–94
[Google Scholar]
Ganesh N 2009. Simultaneous credible intervals for small area estimation problems. J. Multivar. Anal. 100:1610–21
[Google Scholar]
Ganesh N, Lahiri P 2008. A new class of average moment matching priors. Biometrika 95:514–20
[Google Scholar]
Gershunskaya J, Jiang J, Lahiri P 2009. Resampling methods in surveys. Sample Surveys: Theory, Methods and Inference D Pfeffermann, CR Rao121–51 Amsterdam: Elsevier
[Google Scholar]
Gershunskaya J, Lahiri P 2018. Robust empirical best small area finite population mean estimation using a mixture model. Calcutta Stat. Assoc. Bull. 69:183–204
[Google Scholar]
Ghosh M, Lahiri P 1987. Robust empirical Bayes estimation of means from stratified samples. J. Am. Stat. Assoc. 82:1153–62
[Google Scholar]
Ghosh M, Lahiri P, Tiwari RC 1989. Nonparametric empirical Bayes estimation of the distribution and the mean. Comm. Stat. Theory Methods 18:121–46
[Google Scholar]
Ghosh M, Myung J, Moura F 2016. robustsae: robust Bayesian small area estimation. R package https://cran.r-project.org/web/packages/robustsae/index.html
[Google Scholar]
Ghosh M, Natarajan K, Stroud TWF, Carlin BP 1998. Generalized linear models for small-area estimation. J. Am. Stat. Assoc. 93:273–82
[Google Scholar]
Ghosh M, Rao JNK 1994. Small area estimation: an appraisal (with discussion). Stat. Sci. 9:55–93
[Google Scholar]
Goksel H, Judkins DR, Mosher WD 1992. Nonresponse adjustments for a telephone follow-up to a national in-person survey. J. Off. Stat. 8:417–31
[Google Scholar]
González-Manteiga W, Lombardía MJ, Molina I, Morales D, Santamaría L 2008. Bootstrap mean squared error of a small-area EBLUP. J. Stat. Comput. Simul. 8:443–62
[Google Scholar]
Gu Z 2008. Model diagnostics for generalized linear mixed models PhD Diss., Univ. Calif. Davis:
Hall P, Maiti T 2006. Nonparametric estimation of mean-squared prediction error in nested-error regression models. Ann. Stat. 34:1733–50
[Google Scholar]
Hastie TJ, Tibshirani RJ 1990. Generalized Additive Models Boca Raton, FL: Chapman & Hall/CRC
Huber PJ, Ronchetti EM 2009. Robust Statistics Hoboken, NJ: Wiley. 2nd ed.
Jiang J 2001. Goodness-of-fit tests for mixed model diagnostics. Ann. Stat. 29:1137–64
[Google Scholar]
Jiang J 2003. Empirical best prediction for small area inference based on generalized linear mixed models. J. Stat. Plann. Inference 111:117–27
[Google Scholar]
Jiang J 2007. Linear and Generalized Linear Mixed Models and Their Applications New York: Springer
Jiang J 2010. Large Sample Techniques for Statistics New York: Springer
Jiang J 2014. The fence methods. Adv. Stat. 2014:830821
[Google Scholar]
Jiang J 2017. Asymptotic Analysis of Mixed Effects Models: Theory, Applications, and Open Problems Boca Raton, FL: Chapman & Hall/CRC
Jiang J, Lahiri P 2001. Empirical best prediction for small area inference with binary data. Ann. Inst. Stat. Math. 53:217–43
[Google Scholar]
Jiang J, Lahiri P 2006. Mixed model prediction and small area estimation (with discussion). TEST 15:1–96
[Google Scholar]
Jiang J, Lahiri P, Nguyen T 2018. A unified Monte-Carlo jackknife for small area estimation after model selection. Ann. Math. Sci. Appl. 3:405–38
[Google Scholar]
Jiang J, Lahiri P, Wan SM 2002. A unified jackknife theory for empirical best prediction with M-estimation. Ann. Stat. 30:1782–810
[Google Scholar]
Jiang J, Nguyen T 2009. Comments on: Goodness-of-fit tests in mixed models by G. Claeskens and J.D. Hart. TEST 18:248–55
[Google Scholar]
Jiang J, Nguyen T 2012. Small area estimation via heteroscedastic nested-error regression. Can. J. Stat. 40:588–603
[Google Scholar]
Jiang J, Nguyen T, Rao JS 2009. A simplified adaptive fence procedure. Stat. Probab. Lett. 79:625–29
[Google Scholar]
Jiang J, Nguyen T, Rao JS 2010. Fence method for nonparametric small area estimation. Surv. Methodol. 36:3–11
[Google Scholar]
Jiang J, Nguyen T, Rao JS 2011. Best predictive small area estimation. J. Am. Stat. Assoc. 106:732–45
[Google Scholar]
Jiang J, Nguyen T, Rao JS 2015a. Observed best prediction via nested-error regression with potentially misspecified mean and variance. Survey Methodol. 41:37–55
[Google Scholar]
Jiang J, Nguyen T, Rao JS 2015b. The E-MS algorithm: model selection with incomplete data. J. Am. Stat. Assoc. 110:1136–47
[Google Scholar]
Jiang J, Rao JS, Gu Z, Nguyen T 2008. Fence methods for mixed model selection. Ann. Stat. 36:1669–92
[Google Scholar]
Jiang J, Torabi M 2019. Sumca: simple, unified, Monte-Carlo assisted approach to second-order unbiased MSPE estimation. J. R. Stat. Soc. B
[Google Scholar]
Jiang J, Zhang W 2001. Robust estimation in generalized linear mixed models. Biometrika 88:753–765
[Google Scholar]
Koenker P, Bassett G 1978. Regression quantiles. Econometrica 46:33–50
[Google Scholar]
Kott P 1989. Robust small domain estimation using random effects modelling. Survey Methodol. 15:3–12
[Google Scholar]
Kreutzmann AK, Pannier S, Rojas-Perilla N, Schmid T, Templ M, Tzavidis N 2018. emdi: estimating and mapping disaggregated indicators. R package https://cran.r-project.org/web/packages/emdi/emdi.pdf
[Google Scholar]
Lahiri P 2017. Small area estimation with linked data Keynote address, 2017 ISI Satellite Meeting on Small Area Estimation Paris, France: July 10–12
Lahiri P, Rao JNK 1995. Robust estimation of mean squared error of small area estimators. J. Am. Stat. Assoc. 90:758–66
[Google Scholar]
Lange N, Ryan L 1989. Assessing normality in random effects models. Ann. Stat. 17:624–42
[Google Scholar]
Lehtonen R, Veijanen A 1998. Logistic generalised regression estimators. Surv. Methodol. 24:51–56
[Google Scholar]
Little R, Rubin D 2014. Statistical Analysis with Missing Data New York: Wiley. 2nd ed.
Lohr SL 2010. Sampling: Design and Analysis Boston: Brooks/Cole
Lombardía MJ, Sperlich S 2008. Semiparametric inference in generalized mixed effects models. J. R. Stat. Soc. B 70:913–30
[Google Scholar]
Marchetti S, Giusti C, Pratesi M, Salvati N, Giannotti F, et al 2015. Small area model-based estimators using big data sources. J. Off. Stat. 31:263–81
[Google Scholar]
Matloff NS 1981. Use of regression functions for improved estimation of means. Biometrika 68:685–89
[Google Scholar]
Mendez G 2008. Tree-based methods to model dependent data PhD Thesis, Ariz. State Univ.
Mendez G, Lohr S 2011. Estimating residual variance in random forest regression. Comput. Stat. Data Anal. 55:2937–50
[Google Scholar]
Meza J, Lahiri P 2005. A note on the C_p statistic under the nested error regression model. Surv. Methodol. 31:105–9
[Google Scholar]
Molenberghs G, Kenward MG 2007. Missing Data in Clinical Studies New York: Wiley
Molina I, Marhuenda Y 2018. sae: small area estimation. R package https://cran.r-project.org/web/packages/sae/sae.pdf
[Google Scholar]
Molina I, Rao JNK 2010. Small area estimation of poverty indicators. Can. J. Stat. 38:369–85
[Google Scholar]
Molina I, Rao JNK, Datta GS 2015. Small area estimation under a Fay–Herriot model with preliminary testing for the presence of area random effects. Survey Methodol. 41:1–19
[Google Scholar]
Morris CN, Christiansen CL 1995. Hierarchical models for ranking and for identifying extremes with applications. Bayesian Statistics 5 ed. JO Bernando, JO Berger, AP Dawid, AFM Smith 278–95 Oxford, UK: Oxford Univ. Press
[Google Scholar]
Newey WK 1985. Generalized method of moments specification testing. J. Econom. 29:229–56
[Google Scholar]
Opsomer JD, Breidt FJ, Claeskens G, Kauermann G, Ranalli MG 2008. Nonparametric small area estimation using penalized spline regression. J. R. Stat. Soc. B 70:265–86
[Google Scholar]
Pan Z, Lin DY 2005. Goodness-of-fit methods for generalized linear mixed models. Biometrics 61:1000–9
[Google Scholar]
Pfeffermann D 2013. New important developments in small area estimation. Stat. Sci. 28:40–68
[Google Scholar]
Pfeffermann D, Barnard CH 1991. Some new estimators for small-area means with application to the assessment of farmland values. J. Bus. Econ. Stat. 9:73–84
[Google Scholar]
Plass J, Augustin T, Cattaneo M, Schollmeyer G 2015. Statistical modelling under epistemic data imprecision: some results on estimating multinomial distributions and logistic regression for coarse categorical data. Proceedings of the 9th International Symposium on Imprecise Probability: Theories and Applications ed. T Augustin, S Doria, E Miranda, E Quaeghebeur 247–56 https://pdfs.semanticscholar.org/64d7/a6c79502a898ec370774792500c83779139d.pdf
[Google Scholar]
Plass J, Omar A, Augustin T 2017. Towards a cautious modelling of missing data in small area estimation. Proc. Mach. Learn. Res. 62:253–64
[Google Scholar]
Polettini S 2017. A generalised semiparametric Bayesian Fay–Herriot model for small area estimation shrinking both means and variances. Bayesian Anal. 12:729–52
[Google Scholar]
Prasad NGN, Rao JNK 1990. The estimation of mean squared errors of small area estimators. J. Am. Stat. Assoc. 85:163–71
[Google Scholar]
Prasad NGN, Rao JNK 1999. On robust small area estimation using a simple random effects model. Surv. Methodol. 25:67–72
[Google Scholar]
Quenouille M 1949. Approximation tests of correlation in time series. J. R. Stat. Soc. B 11:18–84
[Google Scholar]
Rao JNK, Molina I 2015. Small Area Estimation New York: Wiley. 2nd ed.
Rao JNK, Sinha SK, Dumitrescu L 2014. Robust small area estimation under semi-parametric mixed models. Can. J. Stat. 42:126–41
[Google Scholar]
Särndal CE 1984. Design-consistent versus model-dependent estimation for small domains. J. Am. Stat. Assoc. 79:624–31
[Google Scholar]
Schouten B, de Nooij G 2005. Nonresponse adjustment using classification trees Disc. Pap. 05001, Stat. Neth. Voorburg/Heerlen:
Sinha SK, Rao JNK 2009. Robust small area estimation. Can. J. Stat. 37:381–99
[Google Scholar]
Sugasawa S, Kubokawa T 2017. Heteroscedastic nested error regression models with variance functions. Stat. Sin. 27:1101–23
[Google Scholar]
Sun H, Nguyen T, Luan Y, Jiang J 2018. Classified mixed logistic model prediction. J. Multivar. Anal. 168:63–74
[Google Scholar]
Tang M 2010. Goodness-of-fit tests for generalized linear mixed models PhD Diss., Univ. Md. Coll. Park:
Torabi M 2012. Likelihood inference in generalized linear mixed models with two components of dispersion using data cloning. Comput. Stat. Data Anal. 56:4259–65
[Google Scholar]
Toth D, Eltinge J 2011. Building consistent regression trees from complex sample data. J. Am. Stat. Assoc. 106:1626–36
[Google Scholar]
Vaida F, Blanchard S 2005. Conditional Akaike information for mixed-effects models. Biometrika 92:351–70
[Google Scholar]
Wand M 2003. Smoothing and mixed models. Comput. Stat. 18:223–49
[Google Scholar]
Wang J, Fuller WA, Qu Y 2008. Small area estimation under restriction. Surv. Methodol. 34:29–36
[Google Scholar]
Yan G, Sedransk J 2007. Bayesian diagnostic techniques for detecting hierarchical structure. Bayesian Anal. 2:735–60
[Google Scholar]
Yan G, Sedransk J 2010. A note on Bayesian residuals as a hierarchical model diagnostic technique. Stat. Pap. 51:1
[Google Scholar]
You Y, Rao JNK 2002. A pseudo-empirical best linear unbiased prediction approach to small area estimation using survey weights. Can. J. Stat. 30:431–39
[Google Scholar]
You Y, Rao JNK 2003. Pseudo-hierarchical Bayes small area estimation combining unit-level models and survey weights. J. Stat. Plann. Inference 111:197–208
[Google Scholar]

/content/journals/10.1146/annurev-statistics-031219-041212

Robust Small Area Estimation: An Overview

Annual Review of Statistics and Its Application 7, 337 (2020); https://doi.org/10.1146/annurev-statistics-031219-041212

/content/journals/10.1146/annurev-statistics-031219-041212

Data & Media loading...

Article Type: Review Article

Most Cited Most Cited RSS feed

- Functional Data Analysis
  
  Jane-Ling Wang, Jeng-Min Chiou, and Hans-Georg Müller
  
  Vol. 3 (2016), pp. 257–295
- Probabilistic Forecasting
  
  Tilmann Gneiting, and Matthias Katzfuss
  
  Vol. 1 (2014), pp. 125–151
- Bayesian Computing with INLA: A Review
  
  Håvard Rue, Andrea Riebler, Sigrunn H. Sørbye, Janine B. Illian, Daniel P. Simpson, and Finn K. Lindgren
  
  Vol. 4 (2017), pp. 395–421
- Functional Regression
  
  Jeffrey S. Morris
  
  Vol. 2 (2015), pp. 321–359
- Topological Data Analysis
  
  Larry Wasserman
  
  Vol. 5 (2018), pp. 501–532
- Algorithmic Fairness: Choices, Assumptions, and Definitions
  
  Shira Mitchell, Eric Potash, Solon Barocas, Alexander D'Amour, and Kristian Lum
  
  Vol. 8 (2021), pp. 141–163
- Microbiome, Metagenomics, and High-Dimensional Compositional Data Analysis
  
  Hongzhe Li
  
  Vol. 2 (2015), pp. 73–94
- Learning Deep Generative Models
  
  Ruslan Salakhutdinov
  
  Vol. 2 (2015), pp. 361–385
- On p-Values and Bayes Factors
  
  Leonhard Held, and Manuela Ott
  
  Vol. 5 (2018), pp. 393–419
- High-Dimensional Statistics with a View Toward Applications in Biology
  
  Peter Bühlmann, Markus Kalisch, and Lukas Meier
  
  Vol. 1 (2014), pp. 255–278
More Less

Annual Review of Statistics and Its Application

Volume 7, 2020

Review Article

Free

Robust Small Area Estimation: An Overview

Abstract

Most Read This Month

Most Cited Most Cited RSS feed