1932

Abstract

Researchers apply sampling weights to take account of unequal sample selection probabilities and to frame coverage errors and nonresponses. If researchers do not weight when appropriate, they risk having biased estimates. Alternatively, when they unnecessarily apply weights, they can create an inefficient estimator without reducing bias. Yet in practice researchers rarely test the necessity of weighting and are sometimes guided more by the current practice in their field than by scientific evidence. In addition, statistical tests for weighting are not widely known or available. This article reviews empirical tests to determine whether weighted analyses are justified. We focus on regression models, though the review's implications extend beyond regression. We find that nearly all weighting tests fall into two categories: difference in coefficients tests and weight association tests. We describe the distinguishing features of each category, present their properties, and explain the close relationship between them. We review the simulation evidence on their sampling properties in finite samples. Finally, we highlight the unanswered theoretical and practical questions that surround these tests and that deserve further research.

Loading

Article metrics loading...

/content/journals/10.1146/annurev-statistics-011516-012958
2016-06-01
2024-03-29
Loading full text...

Full text loading...

/deliver/fulltext/statistics/3/1/annurev-statistics-011516-012958.html?itemId=/content/journals/10.1146/annurev-statistics-011516-012958&mimeType=html&fmt=ahah

Literature Cited

  1. An AB. 2008. Performing logistic regression on survey data with the new SURVEYLOGISTIC procedure. Proceedings of the Statistics and Data Analysis Section, SUGI, 27th Orlando, FL. Cary, NC: SAS. http://www2.sas.com/proceedings/sugi27/p258-27.pdf
  2. Asparouhov T, Muthén B. 2007. Testing for informative weights and weights trimming in multivariate modeling with survey data. Proc. Jt. Stat. Meet., Surv. Res. Methods Sect., Salt Lake City, UT, July 29–Aug. 23394–99 Alexandria, VA: Am. Stat. Assoc. http://www.amstat.org/sections/srms/Proceedings/y2007/Files/JSM2007-000745.pdf [Google Scholar]
  3. Bertolet M. 2008. To weight or not to weight? Incorporating sampling designs into model-based analyses PhD thesis, Carnegie Mellon Univ.
  4. Biemer PP, Christ SL. 2008. Weighting survey data. International Handbook of Survey Methodology ED de Leeuw, JJ Hox, DA Dillman 317–41 London: Routledge [Google Scholar]
  5. Binder DA. 1983. On the variance of asymptotically normal estimators from complex surveys. Int. Stat. Rev. 51:279–91 [Google Scholar]
  6. Chambers RL, Dorfman AH, Sverchkov MY. 2003. Nonparametric regression with complex survey data. Analysis of Survey Data RL Chambers, CJ Skinner 151–74 Chichester, UK: Wiley [Google Scholar]
  7. Chambless LE, Boyle KE. 1985. Maximum likelihood methods for complex sample data: logistic regression and discrete proportional hazards models. Commun. Stat. Theory Methods 14:61377–92 [Google Scholar]
  8. Cox LH, Karr AF, Kinney SK. 2011. Risk-utility paradigms for statistical disclosure limitation: how to think, but not how to act (with discussion). Int. Stat. Rev. 79:2160–99 [Google Scholar]
  9. DuMouchel WH, Duncan GJ. 1983. Using sample survey weights in multiple regression analysis. J. Am. Stat. Assoc. 78:383535–43 [Google Scholar]
  10. Efron B. 1979. Bootstrap methods: another look at the jackknife. Ann. Stat. 7:11–26 [Google Scholar]
  11. Eideh AAH, Nathan G. 2006. Fitting time series models for longitudinal survey data under informative sampling. J. Stat. Plann. Inference 136:3052–69 [Google Scholar]
  12. Faiella I. 2010. The use of survey weights in regression analysis Work. Pap. 739, Bank of Italy, Rome
  13. Fienberg SE. 2009. The relevance or irrelevance of weights for confidentiality and statistical analyses. J. Priv. Confid. 1:2183–95 [Google Scholar]
  14. Fienberg SE. 2011. Bayesian models and methods in public policy and government settings. Stat. Sci. 26:2212–26 [Google Scholar]
  15. Fisher RA. 1915. Frequency distribution of the values of the correlation coefficient in samples of an indefinitely large population. Biometrika 10:4507–21 [Google Scholar]
  16. Fisher RA. 1921. On the ‘probable error’ of a coefficient of correlation deduced from a small sample. Metron 1:3–32 [Google Scholar]
  17. Fuller WA. 1984. Least squares and related analyses for complex survey designs. Surv. Methodol. 10:197–118 [Google Scholar]
  18. Fuller WA. 2009. Sampling Statistics Hoboken, NJ: Wiley
  19. Hahs-Vaughn DL, Lomax RG. 2006. Utilization of sample weights in single-level structural equation modeling. J. Exp. Educ. 74:2161–90 [Google Scholar]
  20. Hausman JA. 1978. Specification tests in econometrics. Econometrica 46:61251–71 [Google Scholar]
  21. Holt D, Smith TMF, Winter PD. 1980. Regression analysis of data from complex surveys. J. R. Stat. Soc. A 143:4474–87 [Google Scholar]
  22. Horvitz DG, Thompson DJ. 1952. A generalization of sampling without replacement from a finite universe. J. Am. Stat. Assoc. 47:663–85 [Google Scholar]
  23. Kalsbeek WD, Agans RP. 2007. Sampling and weighting in household telephone surveys. Advances in Telephone Survey Methodology JM Lepkowski, C Tucker, JM Brick, ED de Leeuw, L Japec et al. Hoboken, NJ: Wiley [Google Scholar]
  24. Klein LR, Morgan JN. 1951. Results of alternative statistical treatments of sample survey data. J. Am. Stat. Assoc. 46:442–60 [Google Scholar]
  25. Kott PS. 1990. What does performing a regression on survey data mean?. Proc. Jt. Stat. Meet., Surv. Res. Methods Sect., Anaheim, CA, Aug. 6–9337–41 Alexandria, VA: Am. Stat. Assoc. http://www.amstat.org/sections/srms/Proceedings/papers/1990_053.pdf [Google Scholar]
  26. Little R. 2009. Weighting and prediction in sample surveys Work. Pap. 81, Dep. Biostat., Univ. Michigan. http://www.bepress.com/umichbiostat/paper81
  27. Lohr SL. 2010. Sampling: Design and Analysis. Pacific Grove, CA: Duxbury, 2nd ed..
  28. Morel G. 1989. Logistic regression under complex survey designs. Surv. Methodol. 15:203–23 [Google Scholar]
  29. Pfeffermann D. 1993. The role of sampling weights when modeling survey data. Int. Stat. Rev. 61:2317–37 [Google Scholar]
  30. Pfeffermann D. 1996. The use of sampling weights for survey data analysis. Stat. Methods Med. Res. 5:239–61 [Google Scholar]
  31. Pfeffermann D, Sverchkov M. 1999. Parametric and semi-parametric estimation of regression models fitted to survey data. Sankhyā Indian J. Stat. B 61:166–86 [Google Scholar]
  32. Pfeffermann D, Sverchkov M. 2003. Fitting generalized linear models under informative sampling. Analysis of Survey Data RL Chambers, CJ Skinner 175–94 Chichester, UK: Wiley doi: 10.1002/0470867205.ch12 [Google Scholar]
  33. Pfeffermann D, Sverchkov M. 2007. Small area estimation under informative probability sampling of areas and within the selected areas. J. Am. Stat. Assoc. 102:4801427–38 [Google Scholar]
  34. Pfeffermann D, Sverchkov M. 2009. Inference under informative sampling. Handbook of Statistics 29 Pt. B: Sample Surveys: Inference and Analysis, ed. D Pfeffermann, CR Rao 455–87 Amsterdam: Elsevier [Google Scholar]
  35. Roberts G, Rao JNK, Kumar S. 1987. Logistic regression analysis of sample survey data. Biometrika 74:11–12 [Google Scholar]
  36. Schenker N, Gentleman JF. 2001. On judging the significance of differences by examining the overlap between confidence intervals. Am. Stat. 55:3182–86 [Google Scholar]
  37. Scott A. 2006. Population-based case control studies. Surv. Methodol. 32:2123–32 [Google Scholar]
  38. Skinner CJ. 1989. Domain means, regression and multivariate analysis. Analysis of Complex Surveys CJ Skinner, D Holt, TMF Smith 59–87 New York: Wiley [Google Scholar]
  39. Smith TMF. 1988. To weight or not to weight, that is the question. Bayesian Stat. 3:437–51 [Google Scholar]
  40. Smith TMF, Sugden RA. 1988. Sampling and assignment mechanisms in experiments, surveys and observational studies. Rev. Int. Stat. 56:2165–80 [Google Scholar]
  41. Sterba SK. 2009. Alternative model-based and design-based frameworks for inference from samples to populations: from polarization to integration. Multivar. Behav. Res. 44:6711–40 [Google Scholar]
  42. Sugden RA, Smith TMF. 1984. Ignorable and informative designs in survey sampling inference. Biometrika 71:3495–506 [Google Scholar]
  43. Valliant R, Dever JA, Kreuter F. 2013. Practical Tools for Designing and Weighting Sample Surveys New York: Springer-Verlag
  44. Winship C, Radbill L. 1994. Sampling weights and regression analysis. Sociol. Methods Res. 23:2230–57 [Google Scholar]
  45. Wu Y, Fuller W. 2005. Preliminary testing procedures for regression with survey samples. Proc. Jt. Stat. Meet., Surv. Res. Methods Sect. Minneapolis, MN, Aug. 7–11 3683–88 Alexandria, VA: Am. Stat. Assoc. http://www.amstat.org/sections/srms/Proceedings/y2005/Files/JSM2005-000099.pdf [Google Scholar]
/content/journals/10.1146/annurev-statistics-011516-012958
Loading
/content/journals/10.1146/annurev-statistics-011516-012958
Loading

Data & Media loading...

  • Article Type: Review Article
This is a required field
Please enter a valid email address
Approval was a Success
Invalid data
An Error Occurred
Approval was partially successful, following selected items could not be processed due to error