Web-Based Enrollment and Other Types of Self-Selection in Surveys and Studies: Consequences for Generalizability

Niels Keiding; Thomas A. Louis

doi:10.1146/annurev-statistics-031017-100127

Annual Review of Statistics and Its Application

Volume 5, 2018

Review Article

Free

Web-Based Enrollment and Other Types of Self-Selection in Surveys and Studies: Consequences for Generalizability

Niels Keiding¹, and Thomas A. Louis²
View Affiliations Hide Affiliations

Affiliations: ¹Department of Biostatistics, University of Copenhagen, Copenhagen DK-1014, Denmark; email: [email protected] ²Department of Biostatistics, Johns Hopkins Bloomberg School of Public Health, Baltimore, Maryland 21205, USA; email: [email protected]
Vol. 5:25-47 (Volume publication date March 2018) https://doi.org/10.1146/annurev-statistics-031017-100127
First published as a Review in Advance on October 13, 2017
© Annual Reviews

Abstract

Web-based enrollment in surveys and studies is increasingly attractive as the Internet is approaching near-universal coverage and the attitude of respondents toward participation in classical modes of study deteriorates. Follow-up is also facilitated by the web-based approach. However, the consequent self-selection raises the question of the importance of representativity when attempting to generalize the results of a study beyond the context in which they were obtained, particularly under effect heterogeneity. Our review is divided into three main components: first, sample surveys or prevalence studies, assessing the frequency or prevalence of some attitude or disease condition in a population from its frequency in a sample from this population; second, generalization of the results from randomized trials to the population in which they were performed and to other populations; and third, generalization of results from observational studies.

Keyword(s): external validity, internal validity, nonprobability samples, representativity, transportability, unmeasured confounders

Article metrics loading...

/content/journals/10.1146/annurev-statistics-031017-100127

2018-03-07

2024-05-07

Full text loading...

/deliver/fulltext/statistics/5/1/annurev-statistics-031017-100127.html?itemId=/content/journals/10.1146/annurev-statistics-031017-100127&mimeType=html&fmt=ahah

Literature Cited

Allsworth JE. 2015. Invited commentary: recruiting for epidemiologic studies using social media. Am. J. Epidemiol. 181:747–49 [Google Scholar]
Ansolabehere S, Hersh E. 2012. Validation: what big data reveal about survey misreporting and the real electorate. Political Anal 20:437–59 [Google Scholar]
Baker R, Brick JM, Bates NA, Battaglia M, Couper MP. et al. 2013. Summary report of the AAPOR Task Force on Non-Probability Sampling. J. Surv. Stat. Methodol. 1:90–143 [Google Scholar]
Bareinboim E, Pearl J. 2016. Causal inference and the data-fusion problem. PNAS 113:7345–52 [Google Scholar]
Battaglia MP. 2008. Nonprobability sampling. Encyclopedia of Survey Research Methods PJ Lavrakas 523–26 Thousand Oaks, CA: Sage [Google Scholar]
Bell WR, Datta GS, Ghosh M. 2013. Benchmarking small area estimates. Biometrika 100:189–202 [Google Scholar]
BCAC (Breast Cancer Assoc. Consort.). 2006. Commonly studied single-nucleotide polymorphisms and breast cancer: results from the Breast Cancer Association Consortium. J. Natl. Cancer Inst. 98:1382–96 [Google Scholar]
Buchanan AL, Hudgens MG, Cole SR, Mollan KR, Sax PE. et al. 2015. Generalizing evidence from randomized trials using inverse probability of sampling weights Tech. Rep. 45, Dep. Biostat., Univ North Carolina: http://biostats.bepress.com/uncbiostat/art45/
Chang L, Krosnick JA. 2009. National surveys via RDD telephone interviewing versus the Internet: comparing sample representativeness and response quality. Public Opin. Q. 73:641–78 [Google Scholar]
Chatterjee N, Chen YH, Maas P, Carroll RJ. 2016.a Constrained maximum likelihood estimation for model calibration using summary-level information from external big data sources (with discussion). J. Am. Stat. Assoc. 111:107–31 [Google Scholar]
Chatterjee N, Chen YH, Maas P, Carroll RJ. 2016.b Rejoinder. J. Am. Stat. Assoc. 111:130–31 [Google Scholar]
Cole SR, Stuart EA. 2010. Generalizing evidence from randomized clinical trials to target populations: the ACTG 320 trial. Am. J. Epidemiol. 172:107–15 [Google Scholar]
Couper M. 2000. Web surveys: a review of issues and approaches. Public Opin. Q. 64:464–94 [Google Scholar]
Curran EA, Khashan AS, Dalman C, Kenny LC, Cryan JF. et al. 2016. Obstetric mode of delivery and attention-deficit/hyperactivity disorder: a sibling-matched study. Int. J. Epidemiol. 45:532–42 [Google Scholar]
Elliott M. 2016. Discussion on the paper by Keiding and Louis. J. R. Stat. Soc. A 179:357 [Google Scholar]
Ewertz M, Kempel MM, Düring M, Jensen MB, Andersson M. et al. 2008. Breast conserving treatment in Denmark, 1989–1998. A nationwide population-based study of the Danish Breast Cancer Co-operative Group. Acta Oncol 47:682–90 [Google Scholar]
Frangakis CE. 2009. The calibration of treatment effects from clinical trials to target populations. Clin. Trials 6:136–40 [Google Scholar]
Frangakis CE, Rubin DB. 2002. Principal stratification in causal inference. Biometrics 58:21–29 [Google Scholar]
Galea S, Tracy M. 2007. Participation rates in epidemiologic studies. Ann. Epidemiol. 17:643–53 [Google Scholar]
Greenhouse JB, Kaizar EE, Kelleher K, Seltman H, Gardner W. 2008. Generalizing from clinical trial data: a case study. The risk of suicidality among pediatric antidepressant users. Stat. Med. 27:1801–13 [Google Scholar]
Groves RM, Harris-Kojetin BA. 2017. Innovations in Federal Statistics: Combining Data Sources While Protecting Privacy Washington, DC: Natl. Acad. Press
Hammad TA, Laughren T, Racoosin J. 2006. Suicidality in pediatric patients treated with antidepressant drugs. Arch. Gen. Psychiatry 63:332–39 [Google Scholar]
Haneuse S. 2016. Distinguishing selection bias and confounding bias in comparative effectiveness research. Med. Care 54:E23–29 [Google Scholar]
Haneuse S, Rivera C. 2016. Comment to Chatterjee et al. J. Am. Stat. Assoc. 111:121–22 [Google Scholar]
Harris ML, Luxton D, Wigginton B, Lucke JC. 2015.a Harris et al. respond to “Social media recruitment”. Am. J. Epidemiol. 181:750–51 [Google Scholar]
Harris ML, Luxton D, Wigginton B, Lucke JC. 2015.b Recruiting online: lessons from a longitudinal survey of contraception and pregnancy intentions of young Australian women. Am. J. Epidemiol. 181:737–46 [Google Scholar]
Hartman E, Grieve R, Ramsahai R, Sekhon J. 2015. From sample average treatment effect to population averaged treatment effect: combining experimental with observational studies to estimate population treatment effects. J. R. Stat. Soc. A 178:757–78 [Google Scholar]
Hatch EE, Hahn KA, Wise LA, Mikkelsen EM, Kumar R. et al. 2016. Evaluation of selection bias in an Internet-based study of pregnancy planners. Epidemiology 27:98–104 [Google Scholar]
Hernán MA, Robins JM. 2006. Estimating causal effects from epidemiological data. J. Epidemiol. Community Health 60:578–86 [Google Scholar]
Hernán MA, Robins JM. 2017. Causal Inference Boca Raton, FL: Chapman & Hall/CRC
Horvitz D, Thompson D. 1952. A generalization of sampling without replacement from a finite universe. J. Am. Stat. Assoc. 47:663–85 [Google Scholar]
Imai K, King G, Stuart EA. 2008. Misunderstandings between experimentalists and observationalists about causal inference. J. R. Stat. Soc. A 171:481–502 [Google Scholar]
Japec L, Kreuter F, Bert M, Biemer P, Decker P. et al. 2015. AAPOR report on big data Tech. Rep., Am. Assoc. Public Opin. Res Oakbrook Terrace, IL:
Keeter S. 2014. Change is afoot in the world of election polling. Amstat News 448:3–4 [Google Scholar]
Keiding N, Louis TA. 2016. Perils and potentials of self-selected entry to epidemiological studies and surveys (with discussion and response). J. R. Stat. Soc. A 179:319–76 [Google Scholar]
Kennedy C, Mercer A, Keeter S, Hatley N, McGeeney K, Gimenez A. 2016. Evaluating Online Nonprobability Surveys Washington, DC: Pew Res. Cent.
Kiær AN. 1895. Observations et expériences concernant des dénombrements représentatifs. Bull. Int. Stat. Inst. 9:176–83 [Google Scholar]
Kruskal W, Mosteller F. 1979.a Representative sampling, I: non-scientific literature. Int. Stat. Rev. 47:13–24 [Google Scholar]
Kruskal W, Mosteller F. 1979.b Representative sampling, II: scientific literature, excluding statistics. Int. Stat. Rev. 47:111–27 [Google Scholar]
Kruskal W, Mosteller F. 1979.c Representative sampling, III: the current statistical literature. Int. Stat. Rev. 47:245–65 [Google Scholar]
Kruskal W, Mosteller F. 1980. Representative sampling, IV: the history of the concept in statistics, 1895–1939. Int. Stat. Rev. 48:169–95 [Google Scholar]
Leenheer J, Scherpenzeel AC. 2013. Does it pay off to include non-Internet households in an Internet panel. Int. J. Internet Sci. 8:17–29 [Google Scholar]
Lesko CR, Buchanan AL, Westreich D, Edwards JK, Hudgens MG, Cole SR. 2017. Generalizing study results: a potential outcomes perspective. Epidemiology 28:553–61 [Google Scholar]
Little RJ. 2004. To model or not to model? Competing modes of inference for finite population sampling. J. Am. Stat. Assoc. 99:546–56 [Google Scholar]
Little RJ. 2012. Calibrated Bayes, an alternative inferential paradigm for official statistics (with discussion). J. Off. Stat. 28:309–72 [Google Scholar]
Lohr SL, Raghunathan TE. 2017. Combining survey data with other data sources. Stat. Sci. 32:293–312 [Google Scholar]
Louis TA, Zeger SL. 2009. Effective communication of standard errors and confidence intervals. Biostatistics 10:1–2 [Google Scholar]
McCutcheon AL, Rao K, Kaminska O. 2014. The untold story of multi-mode (online and mail) consumer panels: from optimal recruitment to retention and attrition. Online Panel Research: A Data Quality Perspective M Callegaro, RP Baker, J Bethlehem, AS Goritz, JA Krosnick, PJ Lavrakas 104–27 New York: Wiley [Google Scholar]
Méjean C, Szabo de Edelenyi F, Touvier M, Kesse-Guyot E, Julia C. et al. 2014. Motives for participating in a web-based nutrition cohort according to sociodemographic, lifestyle, and health characteristics: the NutriNet-Santé cohort study. J. Med. Internet Res. 16:e189 [Google Scholar]
Meng XL. 2014. A trio of inference problems that could win you a Nobel Prize in statistics (if you help fund it). Past, Present, and Future of Statistical Science X Lin, C Genest, DL Banks, G Molenberghs, DW Scott, J-L Wang 537–62 Boca Raton, FL: CRC [Google Scholar]
Meng XL. 2016. Discussion on the paper by Keiding and Louis. J. R. Stat. Soc. A 179:351–52 [Google Scholar]
Mercer AW, Kreuter F, Keeter S, Stuart EA. 2017. Theory and practice on nonprobability surveys: parallels between causal inference and survey inference (with discussion). Public Opin. Q. 81:250–71 [Google Scholar]
Miettinen OS. 1985. Theoretical Epidemiology New York: Wiley
Mikkelsen EM, Hatch EE, Wise LA, Rothman KJ, Riis A, Sørensen HT. 2009. Cohort profile: the Danish web-based pregnancy planning study “Snart-Gravid”. Int. J. Epidemiol. 38:938–43 [Google Scholar]
Mikkelsen EM, Riis AH, Wise LA, Hatch EE, Rothman KJ. et al. 2016. Alcohol consumption and fecundability: prospective Danish cohort study. Br. Med. J. 354:i4262 [Google Scholar]
Natl. Acad. Sci. Eng. Med. 2016. Reducing Response Burden in the American Community Survey: Proceedings of a Workshop Washington, DC: Natl. Acad. Press
Opsomer JD, Claeskens G, Ranalli MG, Kauermann G, Breidt FJ. 2008. Non-parametric small area estimation using penalized spline regression. J. R. Stat. Soc. B 70:265–86 [Google Scholar]
Pearl J. 2009. Causality: Models, Reasoning, and Inference Cambridge, UK: Cambridge Univ. Press. , 2nd ed..
Pearl J. 2015. Generalizing experimental findings. J. Causal Infer. 3:259–66 [Google Scholar]
Pearl J, Bareinboim E. 2014. External validity: from do-calculus to transportability across populations. Stat. Sci. 29:579–95 [Google Scholar]
Pizzi C, De Stavola BL, Merletti F, Bellocco R, Silva ID. et al. 2011. Sample selection and validity of exposure-disease association estimates in cohort studies. J. Epidemiol. Comm. Health 65:401–11 [Google Scholar]
Pizzi C, De Stavola BL, Pearce N, Lazzarato F, Ghiotti P. et al. 2012. Selection bias and patterns of confounding in cohort studies: the case of the NINFEA web-based birth cohort. J. Epidemiol. Comm. Health 66:976–81 [Google Scholar]
Pizzi C, Pearce N, Richiardi L. 2015. Noncollapsibility in studies based on nonrepresentative samples. Ann. Epidemiol. 25:955–58 [Google Scholar]
Rao K, Kaminska O, McCutcheon AL. 2010. Recruiting probability samples for a multi-mode research panel with Internet and mail components. Public Opin. Q. 74:68–84 [Google Scholar]
Rao RS, Glickman ME, Glynn RJ. 2008. Stopping rules for surveys with multiple waves of nonrespondent follow-up. Stat. Med. 27:2196–213 [Google Scholar]
Richiardi L, Pizzi C, Pearce N. 2013. Representativeness is usually not necessary and often should be avoided. Int. J. Epidemiol. 42:1018–22 [Google Scholar]
Rothman KJ. 1986. Modern Epidemiology New York: Little, Brown
Rothman KJ, Gallacher EE, Hatch EE. 2013. Why representativeness should be avoided. Int. J. Epidemiol. 42:1012–14 [Google Scholar]
Rothman KJ, Greenland S. 1998. Modern Epidemiology Philadelphia: Lippincott Williams and Wilkins. , 2nd ed..
Rothman KJ, Greenland S, Lash TL. 2008. Modern Epidemiology Alphen aan den Rijn, Neth: Wolters Kluwer, 3rd ed..
Särndal CE. 2007. The calibration approach in survey theory and practice. Surv. Methodol. 33:99–119 [Google Scholar]
Särndal CE, Swensson B, Wretman J. 1992. Model Assisted Survey Sampling. New York: Springer
Trewin D. 2014. What are the quality impacts of conducting high profile official statistical collections on a voluntary basis?. Stat. J. IAOS 30:231–35 [Google Scholar]
Wagner J, West BT, Kirgis N, Lepkowski JM, Axinn WG, Ndiaye SK. 2012. Use of paradata in a responsive design framework to manage a field data collection. J. Off. Stat. 28:477–99 [Google Scholar]
Wang W, Rothschild D, Goel S, Gelman A. 2014. Forecasting elections with non-representative polls. Int. J. Forecast. 31:980–91 [Google Scholar]
Weisberg HI, Hayden VC, Pontes VP. 2009. Selection criteria and generalizability within the counterfactual framework: explaining the paradox of antidepressant-induced suicidality. Clin. Trials 6:109–18 [Google Scholar]
Weisberg HI, Hayden VC, Pontes VP. 2010. Apparent relationship between relative risk and control rate: statistical artifact or selection bias. Clin. Trials 7:118–19 [Google Scholar]
Yeager DS, Krosnick JA, Chang L, Javitz HS, Levendusky MS. et al. 2011. Comparing the accuracy of RDD telephone surveys and Internet surveys conducted with probability and non-probability samples. Public Opin. Q. 75:709–47 [Google Scholar]
Zukin C. 2015. What's the matter with polling?. New York Times June 21SR1

/content/journals/10.1146/annurev-statistics-031017-100127

Article Type: Review Article

Most Cited Most Cited RSS feed

- Functional Data Analysis
  
  Jane-Ling Wang, Jeng-Min Chiou, and Hans-Georg Müller
  
  Vol. 3 (2016), pp. 257–295
- Probabilistic Forecasting
  
  Tilmann Gneiting, and Matthias Katzfuss
  
  Vol. 1 (2014), pp. 125–151
- Bayesian Computing with INLA: A Review
  
  Håvard Rue, Andrea Riebler, Sigrunn H. Sørbye, Janine B. Illian, Daniel P. Simpson, and Finn K. Lindgren
  
  Vol. 4 (2017), pp. 395–421
- Functional Regression
  
  Jeffrey S. Morris
  
  Vol. 2 (2015), pp. 321–359
- Topological Data Analysis
  
  Larry Wasserman
  
  Vol. 5 (2018), pp. 501–532
- Algorithmic Fairness: Choices, Assumptions, and Definitions
  
  Shira Mitchell, Eric Potash, Solon Barocas, Alexander D'Amour, and Kristian Lum
  
  Vol. 8 (2021), pp. 141–163
- Microbiome, Metagenomics, and High-Dimensional Compositional Data Analysis
  
  Hongzhe Li
  
  Vol. 2 (2015), pp. 73–94
- Learning Deep Generative Models
  
  Ruslan Salakhutdinov
  
  Vol. 2 (2015), pp. 361–385
- On p-Values and Bayes Factors
  
  Leonhard Held, and Manuela Ott
  
  Vol. 5 (2018), pp. 393–419
- High-Dimensional Statistics with a View Toward Applications in Biology
  
  Peter Bühlmann, Markus Kalisch, and Lukas Meier
  
  Vol. 1 (2014), pp. 255–278
More Less

Annual Review of Statistics and Its Application

Volume 5, 2018

Review Article

Free

Web-Based Enrollment and Other Types of Self-Selection in Surveys and Studies: Consequences for Generalizability

Abstract

Most Read This Month

Most Cited Most Cited RSS feed