A Systematic Statistical Approach to Evaluating Evidence from Observational Studies

David Madigan; Paul E. Stang; Jesse A. Berlin; Martijn Schuemie; J. Marc Overhage; Marc A. Suchard; Bill Dumouchel; Abraham G. Hartzema; Patrick B. Ryan

doi:10.1146/annurev-statistics-022513-115645

Annual Review of Statistics and Its Application

Volume 1, 2014

Review Article

Free

A Systematic Statistical Approach to Evaluating Evidence from Observational Studies

David Madigan^1,2, Paul E. Stang^2,3, Jesse A. Berlin⁴, Martijn Schuemie^2,3, J. Marc Overhage^2,5, Marc A. Suchard^2,6,7,8, Bill Dumouchel^2,9, Abraham G. Hartzema^2,10, and Patrick B. Ryan^2,3
View Affiliations Hide Affiliations

Affiliations: ¹Department of Statistics, Columbia University, New York, New York 10027; email: [email protected] ²Observational Medical Outcomes Partnership, Foundation for the National Institutes of Health, Bethesda, Maryland 20810 ³Janssen Research and Development LLC, Titusville, New Jersey, 08560 ⁴Johnson & Johnson, New Brunswick, New Jersey, 08901; email: [email protected], [email protected], [email protected], [email protected] ⁵Siemens Health Services, Malvern, Pennsylvania, 19355; email: [email protected] ⁶Department of Biomathematics, David Geffen School of Medicine, University of California, Los Angeles, California, 90095; email: [email protected] ⁷Department of Human Genetics, David Geffen School of Medicine, University of California, Los Angeles, California, 90095 ⁸Department of Biostatistics, Fielding School of Public Health, University of California, Los Angeles, California, 90095 ⁹Oracle Health Sciences, Burlington, Massachusetts, 01803; email: [email protected] ¹⁰College of Pharmacy, University of Florida, Gainesville, Florida, 32610; email: [email protected]
Vol. 1:11-39 (Volume publication date January 2014) https://doi.org/10.1146/annurev-statistics-022513-115645
First published as a Review in Advance on November 20, 2013
© Annual Reviews

Abstract

Threats to the validity of observational studies on the effects of interventions raise questions about the appropriate role of such studies in decision making. Nonetheless, scholarly journals in fields such as medicine, education, and the social sciences feature many such studies, often with limited exploration of these threats, and the lay press is rife with news stories based on these studies. Consumers of these studies rely on the expertise of the study authors to conduct appropriate analyses, and on the thoroughness of the scientific peer-review process to check the validity, but the introspective and ad hoc nature of the design of these analyses appears to elude any meaningful objective assessment of their performance. Here, we review some of the challenges encountered in observational studies and review an alternative, data-driven approach to observational study design, execution, and analysis. Although much work remains, we believe this research direction shows promise.

Keyword(s): data interpretation, electronic heath records, epidemiology, observational studies, pharmacovigilance, statistical

Article metrics loading...

/content/journals/10.1146/annurev-statistics-022513-115645

2014-01-03

2024-04-19

Full text loading...

/deliver/fulltext/statistics/1/1/annurev-statistics-022513-115645.html?itemId=/content/journals/10.1146/annurev-statistics-022513-115645&mimeType=html&fmt=ahah

Literature Cited

Atkins D, Best D, Briss PA, Eccles M, Falck-Ytter Y. et al. 2004. Grading quality of evidence and strength of recommendations. BMJ 328:1490 [Google Scholar]
Azoulay L, Yin H, Filion KB, Assayag J, Majdan A. et al. 2012. The use of pioglitazone and the risk of bladder cancer in people with type 2 diabetes: nested case-control study. BMJ 344:e3645 [Google Scholar]
Berlin JA, Glasser SC, Ellenberg SS. 2008. Adverse event detection in drug development: recommendations and obligations beyond phase 3. Am. J. Public Health 98:1366–71 [Google Scholar]
Bosco JL, Silliman RA, Thwin SS, Geiger AM, Buist DS. et al. 2010. A most stubborn bias: No adjustment method fully resolves confounding by indication in observational studies. J. Clin. Epidemiol. 63:64–74 [Google Scholar]
Bravo G, Dubois MF, Hébert R, De Wals P, Messier L. 2002. A prospective evaluation of the Charlson Comorbidity Index for use in long-term care patients. J. Am. Geriatr. Soc. 50:740–45 [Google Scholar]
Brookhart MA, Rassen JA, Schneeweiss S. 2010a. Instrumental variable methods in comparative safety and effectiveness research. Pharmacoepidemiol. Drug Saf. 19:537–54 [Google Scholar]
Brookhart MA, Sturmer T, Glynn RJ, Rassen J, Schneeweiss S. 2010b. Confounding control in healthcare database research: challenges and potential approaches. Med. Care 48:S114–20 [Google Scholar]
Cadarette SM, Katz JN, Brookhart MA, Sturmer T, Stedman MR. et al. 2009. Comparative gastrointestinal safety of weekly oral bisphosphonates. Osteoporos. Int. 20:1735–47 [Google Scholar]
Charlson ME, Pompei P, Ales KL, MacKenzie CR. 1987. A new method of classifying prognostic comorbidity in longitudinal studies: development and validation. J. Chronic Dis. 40:373–83 [Google Scholar]
Charlson ME, Szatrowski TP, Peterson J, Gold J. 1994. Validation of a combined comorbidity index. J. Clin. Epidemiol. 47:1245–51 [Google Scholar]
Cleves MA, Sanchez N, Draheim M. 1997. Evaluation of two competing methods for calculating Charlson's comorbidity index when analyzing short-term mortality using administrative data. J. Clin. Epidemiol. 50:903–8 [Google Scholar]
Cole SR, Platt RW, Schisterman EF, Chu H, Westreich D. et al. 2010. Illustrating bias due to conditioning on a collider. Int. J. Epidemiol. 39:417–20 [Google Scholar]
D'Hoore W, Bouckaert A, Tilquin C. 1996. Practical considerations on the use of the Charlson comorbidity index with administrative data bases. J. Clin. Epidemiol. 49:1429–33 [Google Scholar]
D'Hoore W, Sicotte C, Tilquin C. 1993. Risk adjustment in outcome assessment: the Charlson comorbidity index. Methods Inf. Med. 32:382–87 [Google Scholar]
Donahue JG, Weiss ST, Goetsch MA, Livingston JM, Greineder DK, Platt R. 1997. Assessment of asthma using automated and full-text medical records. J. Asthma 34:273–81 [Google Scholar]
Dudl RJ, Wang MC, Wong M, Bellows J. 2009. Preventing myocardial infarction and stroke with a simplified bundle of cardioprotective medications. Am. J. Manag. Care 15:e88–94 [Google Scholar]
Farley JF, Harley CR, Devine JW. 2006. A comparison of comorbidity measurements to predict healthcare expenditures. Am. J. Manag. Care 12:110–19 [Google Scholar]
Fawcett T. 2006. An introduction to ROC analysis. Pattern Recognit. Lett. 27:861–74 [Google Scholar]
Gagne JJ, Fireman B, Ryan PB, Maclure M, Gerhard T. et al. 2012. Design considerations in an active medical product safety monitoring system. Pharmacoepidemiol. Drug Saf. 21:Suppl. 132–40 [Google Scholar]
García Rodríguez LA, Pérez Gutthann S. 1998. Use of the UK General Practice Research Database for pharmacoepidemiology. Br. J. Clin. Pharmacol. 45:419–25 [Google Scholar]
Hansen RA, Gray MD, Fox BI, Hollingsworth JC, Gao J, Zeng P. 2013. How well do various health outcome definitions identify appropriate cases in observational studies?. Drug Saf. 36Suppl. 1S27–32
Harrold LR, Saag KG, Yood RA, Mikuls TR, Andrade SE. et al. 2007. Validity of gout diagnoses in administrative data. Arthritis Rheum. 57:103–8 [Google Scholar]
Hartzema AG, Porta MS, Tilson HH. 1999. Pharmacoepidemiology: An Introduction Cincinnati, OH: Harvey Whitney Books
Hartzema AG, Tilson HH, Chan KA. 2008. Pharmacoepidemiology and Therapeutic Risk Management Cincinnati, OH: Harvey Whitney Books
Hennessy S. 2006. Use of health care databases in pharmacoepidemiology. Basic Clin. Pharmacol. Toxicol. 98:311–13 [Google Scholar]
Hennessy S, Leonard CE, Freeman CP, Deo R, Newcomb C. et al. 2009. Validation of diagnostic codes for outpatient-originating sudden cardiac death and ventricular arrhythmia in Medicaid and Medicare claims data. Pharmacoepidemiol. Drug Saf. 19:555–62 [Google Scholar]
Hennessy S, Leonard CE, Palumbo CM, Newcomb C, Bilker WB. 2007. Quality of Medicaid and Medicare data obtained through Centers for Medicare and Medicaid Services (CMS). Med. Care 45:1216–20 [Google Scholar]
Hogan JW, Lancaster T. 2004. Instrumental variables and inverse probability weighting for causal inference from longitudinal observational studies. Stat. Methods Med. Res. 13:17–48 [Google Scholar]
Ioannidis JP. 2005. Why most published research findings are false. PLoS Med. 2:e124 [Google Scholar]
Jewell N. 2004. Statistics for Epidemiology Boca Raton, FL: Chapman & Hall
Lee DS, Donovan L, Austin PC, Gong Y, Liu PP. et al. 2005. Comparison of coding of heart failure and comorbidities in administrative and clinical data for use in outcomes research. Med. Care 43:182–88 [Google Scholar]
Leonard CE, Haynes K, Localio AR, Hennessy S, Tjia J. et al. 2008. Diagnostic E-codes for commonly used, narrow therapeutic index medications poorly predict adverse drug events. J. Clin. Epidemiol. 61:561–71 [Google Scholar]
Lewis JD, Brensinger C. 2004. Agreement between GPRD smoking data: a survey of general practitioners and a population-based survey. Pharmacoepidemiol. Drug Saf. 13:437–41 [Google Scholar]
Lewis JD, Schinnar R, Bilker WB, Wang X, Strom BL. 2007. Validation studies of the health improvement network (THIN) database for pharmacoepidemiology research. Pharmacoepidemiol. Drug Saf. 16:393–401 [Google Scholar]
Li B, Evans D, Faris P, Dean S, Quan H. 2008. Risk adjustment performance of Charlson and Elixhauser comorbidities in ICD-9 and ICD-10 administrative databases. BMC Health Serv. Res. 8:12 [Google Scholar]
Lunt M, Solomon D, Rothman K, Glynn R, Hyrich K. et al. 2009. Different methods of balancing covariates leading to different effect estimates in the presence of effect modification. Am. J. Epidemiol. 169:909–17 [Google Scholar]
Miller DR, Oliveria SA, Berlowitz DR, Fincke BG, Stang P, Lillienfeld DE. 2008. Angioedema incidence in US veterans initiating angiotensin-converting enzyme inhibitors. Hypertension 51:1624–30 [Google Scholar]
Murray RE, Ryan PB, Reisinger SJ. 2011. Design and validation of a data simulation model for longitudinal healthcare data. AMIA Annu. Symp. Proc. 2011:1176–85 [Google Scholar]
Naik G. 2012. Analytical trend troubles scientists. Wall Street Journal May 4
Needham DM, Scales DC, Laupacis A, Pronovost PJ. 2005. A systematic review of the Charlson comorbidity index using Canadian administrative databases: a perspective on risk adjustment in critical care research. J. Crit. Care 20:12–19 [Google Scholar]
Overhage JM, Ryan PB, Reich CG, Hartzema AG, Stang PE. 2012. Validation of a common data model for active safety surveillance research. J. Am. Med. Inform. Assoc. 19:54–60 [Google Scholar]
Pladevall M, Goff DC, Nichaman MZ, Chan F, Ramsey D. et al. 1996. An assessment of the validity of ICD Code 410 to identify hospital admissions for myocardial infarction: the Corpus Christi Heart Project. Int. J. Epidemiol. 25:948–52 [Google Scholar]
Popper KR. 1965. Conjectures and Refutations: The Growth of Scientific Knowledge London: Routledge & Kegan Paul
Quan H, Sundararajan V, Halfon P, Fong A, Burnand B. et al. 2005. Coding algorithms for defining comorbidities in ICD-9-CM and ICD-10 administrative data. Med. Care 43:1130–39 [Google Scholar]
Rassen JA, Brookhart MA, Glynn RJ, Mittleman MA, Schneeweiss S. 2009. Instrumental variables I: Instrumental variables exploit natural variation in nonexperimental data to estimate causal relationships. J. Clin. Epidemiol. 62:1226–32 [Google Scholar]
Rassen JA, Mittleman MA, Glynn RJ, Brookhart MA, Schneeweiss S. 2010. Safety and effectiveness of bivalirudin in routine care of patients undergoing percutaneous coronary intervention. Eur. Heart J. 31:561–72 [Google Scholar]
Ray WA. 2003. Evaluating medication effects outside of clinical trials: new-user designs. Am. J. Epidemiol. 158:915–20 [Google Scholar]
Ray WA, Murray KT, Hall K, Arbogast PG, Stein CM. 2012. Azithromycin and the risk of cardiovascular death. N. Engl. J. Med. 366:1881–90 [Google Scholar]
Rockhill B, Newman B, Weinberg C. 1998. Use and misuse of population attributable fractions. Am. J. Public Health 88:15–19 [Google Scholar]
Rodriguez EM, Staffa JA, Graham DJ. 2001. The role of databases in drug postmarketing surveillance. Pharmacoepidemiol. Drug Saf. 10:407–10 [Google Scholar]
Rosenbaum PR. 2002. Observational Studies New York: Springer, 2nd ed..
Rothman KJ. 2002. Epidemiology: An Introduction Oxford: Oxford Univ. Press
Rothman KJ, Greenland S, Lash T. 2008. Modern Epidemiology Philadelphia: Lippincott Williams & Wilkins
Rothman KJ, Suissa S. 2008. Exclusion of immortal person-time. Pharmacoepidemiol. Drug Saf. 17:1036 [Google Scholar]
Rubin DB. 1997. Estimating causal effects from large data sets using propensity scores. Ann. Intern. Med. 127:757–63 [Google Scholar]
Ryan PB, Madigan D, Stang PE, Overhage JM, Racoosin JA, Hartzema AG. 2012. Empirical assessment of methods for risk identification in healthcare data: results from the experiments of the Observational Medical Outcomes Partnership. Stat. Med. 31:4401–15 [Google Scholar]
Ryan PB, Schuemie MJ, Welebob E, Duke J, Valentine S, Hartzema AG. 2013. Defining a reference set to support methodological research in drug safety. Drug Saf. 36Suppl. 133–47
Schisterman EF, Cole SR, Platt RW. 2009. Overadjustment bias and unnecessary adjustment in epidemiologic studies. Epidemiology 20:488–95 [Google Scholar]
Schneeweiss S. 2006. Sensitivity analysis and external adjustment for unmeasured confounders in epidemiologic database studies of therapeutics. Pharmacoepidemiol. Drug Saf. 15:291–303 [Google Scholar]
Schneeweiss S. 2007. Developments in post-marketing comparative effectiveness research. Clin. Pharmacol. Ther. 82:143–56 [Google Scholar]
Schneeweiss S. 2010. A basic study design for expedited safety signal evaluation based on electronic healthcare data. Pharmacoepidemiol. Drug Saf. 19:858–68 [Google Scholar]
Schneeweiss S, Avorn J. 2005. A review of uses of health care utilization databases for epidemiologic research on therapeutics. J. Clin. Epidemiol. 58:323–37 [Google Scholar]
Schneeweiss S, Glynn RJ, Tsai EH, Avorn J, Solomon DH. 2005. Adjusting for unmeasured confounders in pharmacoepidemiologic claims data using external information: the example of COX2 inhibitors and myocardial infarction. Epidemiology 16:17–24 [Google Scholar]
Schneeweiss S, Patrick AR, Sturmer T, Brookhart MA, Avorn J. et al. 2007. Increasing levels of restriction in pharmacoepidemiologic database studies of elderly and comparison with randomized trial results. Med. Care 45:S131–42 [Google Scholar]
Schneeweiss S, Rassen JA, Glynn RJ, Avorn J, Mogun H, Brookhart MA. 2009. High-dimensional propensity score adjustment in studies of treatment effects using health care claims data. Epidemiology 20:512–22 [Google Scholar]
Schneeweiss S, Seeger JD, Landon J, Walker AM. 2008. Aprotinin during coronary-artery bypass grafting and risk of death. N. Engl. J. Med. 358:771–83 [Google Scholar]
Schneeweiss S, Seeger JD, Maclure M, Wang PS, Avorn J, Glynn RJ. 2001. Performance of comorbidity scores to control for confounding in epidemiologic studies using claims data. Am. J. Epidemiol. 154:854–64 [Google Scholar]
Schuemie MJ, Coloma PM, Straatman H, Herings RM, Trifirò G. et al. 2012. Using electronic health care records for drug safety signal detection: a comparative evaluation of statistical methods. Med. Care 50:890–97 [Google Scholar]
Schuemie MJ, Ryan PB, DuMouchel W, Suchard MA, Madigan D. 2013. Interpreting observational studies—why empirical calibration is needed to correct p-values. Stat. Med. In press. doi: 10.1002/sim.5925
Seeger JD, Kurth T, Walker AM. 2007. Use of propensity score technique to account for exposure-related covariates: an example and lesson. Med. Care 45:S143–48 [Google Scholar]
Seeger JD, Walker AM, Williams PL, Saperia GM, Sacks FM. 2003. A propensity score-matched cohort study of the effect of statins, mainly fluvastatin, on the occurrence of acute myocardial infarction. Am. J. Cardiol. 92:1447–51 [Google Scholar]
Seeger JD, Williams PL, Walker AM. 2005. An application of propensity score matching using claims data. Pharmacoepidemiol. Drug Saf. 14:465–76 [Google Scholar]
So L, Evans D, Quan H. 2006. ICD-10 coding algorithms for defining comorbidities of acute myocardial infarction. BMC Health Serv. Res. 6:161 [Google Scholar]
Southern DA, Quan H, Ghali WA. 2004. Comparison of the Elixhauser and Charlson/Deyo methods of comorbidity measurement in administrative data. Med. Care 42:355–60 [Google Scholar]
Stang PE, Ryan PB, Racoosin JA, Overhage JM, Hartzema AG. et al. 2010. Advancing the science for active surveillance: rationale and design for the Observational Medical Outcomes Partnership. Ann. Intern. Med. 153:600–6 [Google Scholar]
Strom BL. 2001. Data validity issues in using claims data. Pharmacoepidemiol. Drug Saf. 10:389–92 [Google Scholar]
Strom BL. 2005. Pharmacoepidemiology Chichester, UK: Wiley
Sturmer T, Glynn RJ, Rothman KJ, Avorn J, Schneeweiss S. 2007. Adjustments for unmeasured confounders in pharmacoepidemiologic database studies using external information. Med. Care 45:S158–65 [Google Scholar]
Suissa S. 2007. Immortal time bias in observational studies of drug effects. Pharmacoepidemiol. Drug Saf. 16:241–49 [Google Scholar]
Suissa S. 2008. Immortal time bias in pharmacoepidemiology. Am. J. Epidemiol. 167:492–99 [Google Scholar]
Suissa S, Garbe E. 2007. Primer: administrative health databases in observational studies of drug effects—advantages and disadvantages. Nat. Clin. Pract. Rheumatol. 3:725–32 [Google Scholar]
Szklo M, Nieto FJ. 2007. Epidemiology: Beyond the Basics Burlington, MA: Jones & Bartlett
Tisdale J, Miller D. 2010. Drug-Induced Diseases: Prevention, Detection, and Management Bethesda, MD: Am. Soc. Health-Syst. Pharm, 2nd ed..
Trifirò G, Pariente A, Coloma PM, Kors JA, Polimeni G. et al. 2009. Data mining on electronic health record databases for signal detection in pharmacovigilance: which events to monitor?. Pharmacoepidemiol. Drug Saf. 18:1176–84 [Google Scholar]
Tunstall-Pedoe H. 1997. Validity of ICD code 410 to identify hospital admission for myocardial infarction. Int. J. Epidemiol. 26:461–62 [Google Scholar]
US Food Drug Adm 1999. Managing the Risks from Medical Product Use: Creating a Risk Management Framework Silver Springs, MD: US Food Drug Admin http://www.fda.gov/downloads/Safety/SafetyofSpecificProducts/UCM180520.pdf
Varas-Lorenzo C, Castellsague J, Stang MR, Tomas L, Aguado J, Perez-Gutthann S. 2008. Positive predictive value of ICD-9 codes 410 and 411 in the identification of cases of acute coronary syndromes in the Saskatchewan Hospital automated database. Pharmacoepidemiol. Drug Saf. 17:842–52 [Google Scholar]
Wahl PM, Rodgers K, Schneeweiss S, Gage BF, Butler J. et al. 2010. Validation of claims-based diagnostic and procedure codes for cardiovascular and gastrointestinal serious adverse events in a commercially-insured population. Pharmacoepidemiol. Drug Saf. 19:596–603 [Google Scholar]
Walker AM. 1996. Confounding by indication. Epidemiology 7:335–36 [Google Scholar]
Waller PC, Evans SJ. 2003. A model for the future conduct of pharmacovigilance. Pharmacoepidemiol. Drug Saf. 12:17–29 [Google Scholar]
Weatherby LB, Nordstrom BL, Fife D, Walker AM. 2002. The impact of wording in “Dear doctor” letters and in black box labels. Clin. Pharmacol. Ther. 72:735–42 [Google Scholar]
Whitaker HJ, Farrington CP, Spiessens B, Musonda P. 2006. Tutorial in biostatistics: the self-controlled case series method. Stat. Med. 25:1768–97 [Google Scholar]
Wilchesky M, Tamblyn RM, Huang A. 2004. Validation of diagnostic codes within medical services claims. J. Clin. Epidemiol. 57:131–41 [Google Scholar]
Zhang JX, Iwashyna TJ, Christakis NA. 1999. The performance of different lookback periods and sources of information for Charlson comorbidity adjustment in Medicare claims. Med. Care 37:1128–39 [Google Scholar]

/content/journals/10.1146/annurev-statistics-022513-115645

A Systematic Statistical Approach to Evaluating Evidence from Observational Studies

Annual Review of Statistics and Its Application 1, 11 (2014); https://doi.org/10.1146/annurev-statistics-022513-115645

/content/journals/10.1146/annurev-statistics-022513-115645

Data & Media loading...

Article Type: Review Article

Most Cited Most Cited RSS feed

- Probabilistic Forecasting
  
  Tilmann Gneiting, and Matthias Katzfuss
  
  Vol. 1 (2014), pp. 125–151
- Functional Data Analysis
  
  Jane-Ling Wang, Jeng-Min Chiou, and Hans-Georg Müller
  
  Vol. 3 (2016), pp. 257–295
- Bayesian Computing with INLA: A Review
  
  Håvard Rue, Andrea Riebler, Sigrunn H. Sørbye, Janine B. Illian, Daniel P. Simpson, and Finn K. Lindgren
  
  Vol. 4 (2017), pp. 395–421
- Functional Regression
  
  Jeffrey S. Morris
  
  Vol. 2 (2015), pp. 321–359
- Topological Data Analysis
  
  Larry Wasserman
  
  Vol. 5 (2018), pp. 501–532
- Algorithmic Fairness: Choices, Assumptions, and Definitions
  
  Shira Mitchell, Eric Potash, Solon Barocas, Alexander D'Amour, and Kristian Lum
  
  Vol. 8 (2021), pp. 141–163
- Microbiome, Metagenomics, and High-Dimensional Compositional Data Analysis
  
  Hongzhe Li
  
  Vol. 2 (2015), pp. 73–94
- Learning Deep Generative Models
  
  Ruslan Salakhutdinov
  
  Vol. 2 (2015), pp. 361–385
- On p-Values and Bayes Factors
  
  Leonhard Held, and Manuela Ott
  
  Vol. 5 (2018), pp. 393–419
- High-Dimensional Statistics with a View Toward Applications in Biology
  
  Peter Bühlmann, Markus Kalisch, and Lukas Meier
  
  Vol. 1 (2014), pp. 255–278
More Less

Annual Review of Statistics and Its Application

Volume 1, 2014

Review Article

Free

A Systematic Statistical Approach to Evaluating Evidence from Observational Studies

Abstract

Most Read This Month

Most Cited Most Cited RSS feed