Defining Phenotypes from Clinical Data to Drive Genomic Research

Jamie R. Robinson; Wei-Qi Wei; Dan M. Roden; Joshua C. Denny

doi:10.1146/annurev-biodatasci-080917-013335

Annual Review of Biomedical Data Science

Volume 1, 2018

Review Article

Free

Defining Phenotypes from Clinical Data to Drive Genomic Research

Jamie R. Robinson^1,2, Wei-Qi Wei¹, Dan M. Roden^1,3,4, and Joshua C. Denny^1,3
View Affiliations Hide Affiliations

Affiliations: ¹Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, Tennessee 37232, USA; email: [email protected] ²Department of General Surgery, Vanderbilt University Medical Center, Nashville, Tennessee 37232, USA ³Department of Medicine, Vanderbilt University Medical Center, Nashville, Tennessee 37232, USA ⁴Department of Pharmacology, Vanderbilt University Medical Center, Nashville, Tennessee 37232, USA
Vol. 1:69-92 (Volume publication date July 2018) https://doi.org/10.1146/annurev-biodatasci-080917-013335
First published as a Review in Advance on April 25, 2018
Copyright © 2018 by Annual Reviews. All rights reserved

Abstract

The rise in available longitudinal patient information in electronic health records (EHRs) and their coupling to DNA biobanks have resulted in a dramatic increase in genomic research using EHR data for phenotypic information. EHRs have the benefit of providing a deep and broad data source of health-related phenotypes, including drug response traits, expanding the phenomes available to researchers for discovery. The earliest efforts at repurposing EHR data for research involved manual chart review of limited numbers of patients but now typically involve applications of rule-based and machine learning algorithms operating on sometimes huge corpora for both genome-wide and phenome-wide approaches. In this review, we highlight the current methods, impact, challenges, and opportunities for repurposing clinical data to define patient phenotypes for genomic discovery. Use of EHR data has proven a powerful method for elucidating genomic influences on diseases, traits, and drug-response phenotypes and will continue to have increasing applications in large cohort studies.

Keyword(s): biobank, electronic health record, genomics, GWAS, phenotyping, PheWAS

Article metrics loading...

/content/journals/10.1146/annurev-biodatasci-080917-013335

2018-07-20

2024-05-04

Full text loading...

/deliver/fulltext/biodatasci/1/1/annurev-biodatasci-080917-013335.html?itemId=/content/journals/10.1146/annurev-biodatasci-080917-013335&mimeType=html&fmt=ahah

Literature Cited

1. Hindorff LA, Sethupathy P, Junkins HA, Ramos EM, Mehta JP et al. 2009. Potential etiologic and functional implications of genome-wide association loci for human diseases and traits. PNAS 106:239362–67
[Google Scholar]
2. Henry J, Pylypchuk Y, Searcy T, Patel V 2017. Adoption of electronic health record systems among U.S. non-federal acute care hospitals: 2008–2015. Data Brief 35 Off. Natl. Coord. Health Inf. Technol Washington, DC:
3. Ginsburg GS, Burke TW, Febbo P 2008. Centralized biorepositories for genetic and genomic research. JAMA 299:111359–61
[Google Scholar]
4. Denny JC 2014. Surveying recent themes in translational bioinformatics: big data in EHRs, omics for drugs, and personal genomics. Yearb. Med. Inform. 9:199–205
[Google Scholar]
5. Kullo IJ, Ding K, Jouni H, Smith CY, Chute CG 2010. A genome-wide association study of red blood cell traits using the electronic medical record. PLOS ONE 5:9e13011
[Google Scholar]
6. Ritchie MD, Denny JC, Crawford DC, Ramirez AH, Weiner JB et al. 2010. Robust replication of genotype-phenotype associations across multiple diseases in an electronic medical record. Am. J. Hum. Genet. 86:4560–72
[Google Scholar]
7. Denny JC, Ritchie MD, Crawford DC, Schildcrout JS, Ramirez AH et al. 2010. Identification of genomic predictors of atrioventricular conduction: using electronic medical records as a tool for genome science. Circulation 122:202016–21
[Google Scholar]
8. Denny JC, Bastarache L, Roden DM 2016. Phenome-wide association studies as a tool to advance precision medicine. Annu. Rev. Genom. Hum. Genet. 17:353–73
[Google Scholar]
9. Bowton E, Field JR, Wang S, Schildcrout JS, Van Driest SL et al. 2014. Biobanks and electronic medical records: enabling cost-effective research. Sci. Transl. Med. 6:234234cm3
[Google Scholar]
10. Wei W-Q, Denny JC 2015. Extracting research-quality phenotypes from electronic health records to support precision medicine. Genome Med 7:141
[Google Scholar]
11. Inst. Med. 2003. Key Capabilities of an Electronic Health Record System: Letter Report Washington, DC: Natl. Acad. Press
12. Kaye JA, del Mar Melero-Montes M, Jick H 2001. Mumps, measles, and rubella vaccine and the incidence of autism recorded by general practitioners: a time trend analysis. BMJ 322:7284460–63
[Google Scholar]
13. Asch SM, McGlynn EA, Hogan MM, Hayward RA, Shekelle P et al. 2004. Comparison of quality of care for patients in the Veterans Health Administration and patients in a national sample. Ann. Intern. Med. 141:12938–45
[Google Scholar]
14. Croen LA, Yoshida CK, Odouli R, Newman TB 2005. Neonatal hyperbilirubinemia and risk of autism spectrum disorders. Pediatrics 115:2e135–38
[Google Scholar]
15. Hripcsak G, Ryan PB, Duke JD, Shah NH, Park RW et al. 2016. Characterizing treatment pathways at scale using the OHDSI network. PNAS 113:277329–36
[Google Scholar]
16. Kurreeman F, Liao K, Chibnik L, Hickey B, Stahl E et al. 2011. Genetic basis of autoantibody positive and negative rheumatoid arthritis risk in a multi-ethnic cohort derived from electronic health records. Am. J. Hum. Genet. 88:157–69
[Google Scholar]
17. Pulley J, Clayton E, Bernard GR, Roden DM, Masys DR 2010. Principles of human subjects protections applied in an opt-out, de-identified biobank. Clin. Transl. Sci. 3:142–48
[Google Scholar]
18. NIH (Natl. Inst. Health). 2014. NIH genomic data sharing policy Not. Number NOT-OD-14-124 Natl. Inst. Health Washington, DC:
19. Kho AN, Pacheco JA, Peissig PL, Rasmussen L, Newton KM et al. 2011. Electronic medical records for genetic research: results of the eMERGE consortium. Sci. Transl. Med. 3:7979re1
[Google Scholar]
20. Gaziano JM, Concato J, Brophy M, Fiore L, Pyarajan S et al. 2016. Million Veteran Program: a mega-biobank to study genetic influences on health and disease. J. Clin. Epidemiol. 70:214–23
[Google Scholar]
21. Kirby JC, Speltz P, Rasmussen LV, Basford M, Gottesman O et al. 2016. PheKB: a catalog and workflow for creating electronic phenotype algorithms for transportability. J. Am. Med. Inform. Assoc. 23:61046–52
[Google Scholar]
22. Elliott P, Peakman TC 2008. The UK Biobank sample handling and storage protocol for the collection, processing and archiving of human blood and urine. Int. J. Epidemiol. 37:2234–44
[Google Scholar]
23. Sudlow C, Gallacher J, Allen N, Beral V, Burton P et al. 2015. UK biobank: an open access resource for identifying the causes of a wide range of complex diseases of middle and old age. PLOS Med 12:3e1001779
[Google Scholar]
24. Collins FS, Varmus H 2015. A new initiative on precision medicine. N. Engl. J. Med. 372:9793–95
[Google Scholar]
25. Kohane IS 2011. Using electronic health records to drive discovery in disease genomics. Nat. Rev. Genet. 12:6417–28
[Google Scholar]
26. Denny JC, Crawford DC, Ritchie MD, Bielinski SJ, Basford MA et al. 2011. Variants near FOXE1 are associated with hypothyroidism and other thyroid conditions: using electronic medical records for genome- and phenome-wide studies. Am. J. Hum. Genet. 89:4529–42
[Google Scholar]
27. Burton PR, Hansell AL, Fortier I, Manolio TA, Khoury MJ et al. 2009. Size matters: Just how big is BIG? Quantifying realistic sample size requirements for human genome epidemiology. Int. J. Epidemiol. 38:1263–73
[Google Scholar]
28. Sham PC, Purcell SM 2014. Statistical power and significance testing in large-scale genetic studies. Nat. Rev. Genet. 15:5335–46
[Google Scholar]
29. Wei W-Q, Teixeira PL, Mo H, Cronin RM, Warner JL, Denny JC 2016. Combining billing codes, clinical notes, and medications from electronic health records provides superior phenotyping performance. J. Am. Med. Inform. Assoc. 23:e1e20–27
[Google Scholar]
30. Newton KM, Peissig PL, Kho AN, Bielinski SJ, Berg RL et al. 2013. Validation of electronic medical record-based phenotyping algorithms: results and lessons learned from the eMERGE network. J. Am. Med. Inform. Assoc. 20:e1e147–54
[Google Scholar]
31. Denny JC 2012. Chapter 13: mining electronic health records in the genomics era. PLOS Comput. Biol. 8:12e1002823
[Google Scholar]
32. Denny JC, Peterson JF, Choma NN, Xu H, Miller RA et al. 2010. Extracting timing and status descriptors for colonoscopy testing from electronic medical records. J. Am. Med. Inform. Assoc. 17:4383–88
[Google Scholar]
33. Robinson JR, Denny JC, Roden DM, Van Driest SL 2017. Genome-wide and phenome-wide approaches to understand variable drug actions in electronic health records. Clin. Transl. Sci. 11:112–22
[Google Scholar]
34. Liao KP, Cai T, Savova GK, Murphy SN, Karlson EW et al. 2015. Development of phenotype algorithms using electronic medical records and incorporating natural language processing. BMJ 350:h1885
[Google Scholar]
35. Denny JC, Smithers JD, Miller RA, Spickard A 2003. “Understanding” medical school curriculum content using KnowledgeMap. J. Am. Med. Inform. Assoc. 10:4351–62
[Google Scholar]
36. Friedman C, Shagina L, Lussier Y, Hripcsak G 2004. Automated encoding of clinical documents based on natural language processing. J. Am. Med. Inform. Assoc. 11:5392–402
[Google Scholar]
37. Savova GK, Masanz JJ, Ogren PV, Zheng J, Sohn S et al. 2010. Mayo clinical Text Analysis and Knowledge Extraction System (cTAKES): architecture, component evaluation and applications. J. Am. Med. Inform. Assoc. 17:5507–13
[Google Scholar]
38. Jiang M, Chen Y, Liu M, Rosenbloom ST, Mani S et al. 2011. A study of machine-learning-based approaches to extract clinical entities and their assertions from discharge summaries. J. Am. Med. Inform. Assoc. 18:5601–6
[Google Scholar]
39. Salmasian H, Freedberg DE, Friedman C 2013. Deriving comorbidities from medical records using natural language processing. J. Am. Med. Inform. Assoc. 20:e2e239–42
[Google Scholar]
40. Xu H, Stenner SP, Doan S, Johnson KB, Waitman LR, Denny JC 2010. MedEx: a medication information extraction system for clinical narratives. J. Am. Med. Inform. Assoc. 17:119–24
[Google Scholar]
41. Garvin JH, DuVall SL, South BR, Bray BE, Bolton D et al. 2012. Automated extraction of ejection fraction for quality measurement using regular expressions in Unstructured Information Management Architecture (UIMA) for heart failure. J. Am. Med. Inform. Assoc. 19:5859–66
[Google Scholar]
42. Friedman C, Hripcsak G, Shagina L, Liu H 1999. Representing information in patient reports using natural language processing and the extensible markup language. J. Am. Med. Inform. Assoc. 6:176–87
[Google Scholar]
43. Denny JC, Spickard A, Miller RA, Schildcrout J, Darbar D et al. 2005. Identifying UMLS concepts from ECG Impressions using KnowledgeMap. Proc. AMIA Annu. Symp., 14–18 Nov., San Francisco, Calif.196–200 Bethesda, MD: Am. Med. Inform. Assoc.
[Google Scholar]
44. Elkin PL, Ruggieri AP, Brown SH, Buntrock J, Bauer BA et al. 2001. A randomized controlled trial of the accuracy of clinical record retrieval using SNOMED-RT as compared with ICD9-CM. Proc. AMIA Annu. Symp., 3–7 Nov., Wash., D.C.159–63 Bethesda, MD: Am. Med. Inform. Assoc.
[Google Scholar]
45. Elkin PL, Froehling D, Wahner-Roedler D, Trusko B, Welsh G et al. 2008. NLP-based identification of pneumonia cases from free-text radiological reports. Proc. AMIA Annu. Symp. Proc., 8–12 Nov., Wash., D.C.172–76 Bethesda, MD: Am. Med. Inform. Assoc.
[Google Scholar]
46. Chapman WW, Chu D, Dowling JN 2007. ConText: an algorithm for identifying contextual features from clinical text. Proc. Workshop BioNLP, 29 June, Prague, Czech Repub.81–88 Stroudsberg, PA: Assoc. Comput. Linguist.
[Google Scholar]
47. Zhang Y, Tang B, Jiang M, Wang J, Xu H 2015. Domain adaptation for semantic role labeling of clinical text. J. Am. Med. Inform. Assoc. 22:5967–79
[Google Scholar]
48. Wu Y, Denny JC, Rosenbloom ST, Miller RA, Giuse DA et al. 2015. A preliminary study of clinical abbreviation disambiguation in real time. Appl. Clin. Inform. 6:2364–74
[Google Scholar]
49. Sun W, Rumshisky A, Uzuner O 2015. Normalization of relative and incomplete temporal expressions in clinical narratives. J. Am. Med. Inform. Assoc. 22:51001–8
[Google Scholar]
50. Sohn S, Savova GK 2009. Mayo clinic smoking status classification system: extensions and improvements. Proc. AMIA Annu. Symp., 14–18 Nov., San Francisco, Calif.619–23 Bethesda, MD: Am. Med. Inform. Assoc.
[Google Scholar]
51. Uzuner O, Goldstein I, Luo Y, Kohane I 2008. Identifying patient smoking status from medical discharge records. J. Am. Med. Inform. Assoc. 15:114–24
[Google Scholar]
52. Liu M, Shah A, Jiang M, Peterson NB, Dai Q et al. 2012. A study of transportability of an existing smoking status detection module across institutions. Proc. AMIA Annu. Symp., 3–7 Nov., Chicago, Ill.577–86 Bethesda, MD: Am. Med. Inform. Assoc.
[Google Scholar]
53. Friedlin J, McDonald CJ 2006. Using a natural language processing system to extract and code family history data from admission reports. Proc. AMIA Annu. Symp., 11–15 Nov., Wash., D.C.925 Bethesda, MD: Am. Med. Inform. Assoc.
[Google Scholar]
54. Denny JC, Spickard A, Johnson KB, Peterson NB, Peterson JF, Miller RA 2009. Evaluation of a method to identify and categorize section headers in clinical documents. J. Am. Med. Inform. Assoc. 16:6806–15
[Google Scholar]
55. Bill R, Pakhomov S, Chen ES, Winden TJ, Carter EW, Melton GB 2014. Automated extraction of family history information from clinical notes. Proc. AMIA Annu. Symp., 15–19 Nov., Wash., D.C.1709–17 Bethesda, MD: Am. Med. Inform. Assoc.
[Google Scholar]
56. Mehrabi S, Krishnan A, Roch AM, Schmidt H, Li D et al. 2015. Identification of patients with family history of pancreatic cancer—investigation of an NLP system portability. Stud. Health Technol. Inform. 216:604–8
[Google Scholar]
57. Cent. Medicare Medicaid Serv. 2012. Stage 2 eligible professional meaningful use core measures: measure 5 of 17 Fact sheet Cent. Medicare Medicaid Serv Baltimore, MD:
58. Cent. Medicare Medicaid Serv. 2012. Stage 2 eligible professional meaningful use menu set measures: measure 4 of 6 Fact sheet, Cent. Medicare Medicaid Serv Baltimore, MD:
59. Orlando LA, Buchanan AH, Hahn SE, Christianson CA, Powell KP et al. 2013. Development and validation of a primary care-based family health history and decision support program (MeTree). N.C. Med. J. 74:4287–96
[Google Scholar]
60. Wu RR, Himmel TL, Buchanan AH, Powell KP, Hauser ER et al. 2014. Quality of family history collection with use of a patient facing family history assessment tool. BMC Fam. Pract. 15:31
[Google Scholar]
61. Garla V, Re VL, Dorey-Stein Z, Kidwai F, Scotch M et al. 2011. The Yale cTAKES extensions for document classification: architecture and application. J. Am. Med. Inform. Assoc. 18:5614–20
[Google Scholar]
62. Melton GB, Hripcsak G 2005. Automated detection of adverse events using natural language processing of discharge summaries. J. Am. Med. Inform. Assoc. 12:4448–57
[Google Scholar]
63. Murff HJ, FitzHenry F, Matheny ME, Gentry N, Kotter KL et al. 2011. Automated identification of postoperative complications within an electronic medical record using natural language processing. JAMA 306:8848–55
[Google Scholar]
64. Haerian K, Varn D, Vaidya S, Ena L, Chase HS, Friedman C 2012. Detection of pharmacovigilance-related adverse events using electronic health records and automated methods. Clin. Pharmacol. Ther. 92:2228–34
[Google Scholar]
65. Carroll RJ, Thompson WK, Eyler AE, Mandelin AM, Cai T et al. 2012. Portability of an algorithm to identify rheumatoid arthritis in electronic health records. J. Am. Med. Inform. Assoc. 19:e1e162–69
[Google Scholar]
66. Delaney JT, Ramirez AH, Bowton E, Pulley JM, Basford MA et al. 2012. Predicting clopidogrel response using DNA samples linked to an electronic health record. Clin. Pharmacol. Ther. 91:2257–63
[Google Scholar]
67. Mosley JD, Shaffer CM, Van Driest SL, Weeke PE, Wells QS et al. 2016. A genome-wide association study identifies variants in KCNIP4 associated with ACE inhibitor-induced cough. Pharmacogenom. J. 16:3231–37
[Google Scholar]
68. Zeng QT, Goryachev S, Weiss S, Sordo M, Murphy SN, Lazarus R 2006. Extracting principal diagnosis, co-morbidity and smoking status for asthma research: evaluation of a natural language processing system. BMC Med. Inform. Decis. Mak. 6:30
[Google Scholar]
69. Teixeira PL, Wei W-Q, Cronin RM, Mo H, VanHouten JP et al. 2017. Evaluating electronic health record data sources and algorithmic approaches to identify hypertensive individuals. J. Am. Med. Inform. Assoc. 24:1162–71
[Google Scholar]
70. Denny JC, Miller RA, Waitman LR, Arrieta MA, Peterson JF 2009. Identifying QT prolongation from ECG impressions using a general-purpose natural language processor. Int. J. Med. Inf. 78:S34–42
[Google Scholar]
71. Carroll RJ, Eyler AE, Denny JC 2011. Naïve electronic health record phenotype identification for rheumatoid arthritis. Proc. AMIA Annu. Symp., 22–26 Oct., Wash., D.C.189–96 Bethesda, MD: Am. Med. Inform. Assoc.
[Google Scholar]
72. Wei W-Q, Leibson CL, Ransom JE, Kho AN, Caraballo PJ et al. 2012. Impact of data fragmentation across healthcare centers on the accuracy of a high-throughput clinical phenotyping algorithm for specifying subjects with type 2 diabetes mellitus. J. Am. Med. Inform. Assoc. 19:2219–24
[Google Scholar]
73. Weiskopf NG, Hripcsak G, Swaminathan S, Weng C 2013. Defining and measuring completeness of electronic health records for secondary use. J. Biomed. Inform. 46:5830–36
[Google Scholar]
74. Shivade C, Raghavan P, Fosler-Lussier E, Embi PJ, Elhadad N et al. 2014. A review of approaches to identifying patient phenotype cohorts using electronic health records. J. Am. Med. Inform. Assoc. 21:2221–30
[Google Scholar]
75. Thompson WK, Rasmussen LV, Pacheco JA, Peissig PL, Denny JC et al. 2012. An evaluation of the NQF quality data model for representing electronic health record driven phenotyping algorithms. Proc. AMIA Annu. Symp., 3–7 Nov., Chicago, Ill.911–20 Bethesda, MD: Am. Med. Inform. Assoc.
[Google Scholar]
76. Libbrecht MW, Noble WS 2015. Machine learning applications in genetics and genomics. Nat. Rev. Genet. 16:6321–32
[Google Scholar]
77. Wei W-Q, Tao C, Jiang G, Chute CG 2010. A high throughput semantic concept frequency based approach for patient identification: a case study using type 2 diabetes mellitus clinical notes. Proc. AMIA Annu. Symp., 13–17 Nov., Wash., D.C.857–61 Bethesda, MD: Am. Med. Inform. Assoc.
[Google Scholar]
78. Peissig PL, Santos Costa V, Caldwell MD, Rottscheit C, Berg RL et al. 2014. Relational machine learning for electronic health record-driven phenotyping. J. Biomed. Inform. 52:260–70
[Google Scholar]
79. Liao KP, Cai T, Gainer V, Goryachev S, Zeng-Treitler Q et al. 2010. Electronic medical records for discovery research in rheumatoid arthritis. Arthritis Care Res 62:81120–27
[Google Scholar]
80. Gulshan V, Peng L, Coram M, Stumpe MC, Wu D et al. 2016. Development and validation of a deep learning algorithm for detection of diabetic retinopathy in retinal fundus photographs. JAMA 316:222402–10
[Google Scholar]
81. Esteva A, Kuprel B, Novoa RA, Ko J, Swetter SM et al. 2017. Dermatologist-level classification of skin cancer with deep neural networks. Nature 542:7639115–18
[Google Scholar]
82. Chen Y, Carroll RJ, Hinz ERM, Shah A, Eyler AE et al. 2013. Applying active learning to high-throughput phenotyping algorithms for electronic health records data. J. Am. Med. Inform. Assoc. 20:e2e253–59
[Google Scholar]
83. Chiu P-H, Hripcsak G 2017. EHR-based phenotyping: bulk learning and evaluation. J. Biomed. Inform. 70:35–51
[Google Scholar]
84. Bejan CA, Xia F, Vanderwende L, Wurfel MM, Yetisgen-Yildiz M 2012. Pneumonia identification using statistical feature selection. J. Am. Med. Inform. Assoc. 19:5817–23
[Google Scholar]
85. Yu S, Liao KP, Shaw SY, Gainer VS, Churchill SE et al. 2015. Toward high-throughput phenotyping: unbiased automated feature extraction and selection from knowledge sources. J. Am. Med. Inform. Assoc. 22:5993–1000
[Google Scholar]
86. LeCun Y, Bengio Y, Hinton G 2015. Deep learning. Nature 521:7553436–44
[Google Scholar]
87. Lasko TA, Denny JC, Levy MA 2013. Computational phenotype discovery using unsupervised feature learning over noisy, sparse, and irregular clinical data. PLOS ONE 8:6e66341
[Google Scholar]
88. Doshi-Velez F, Ge Y, Kohane I 2014. Comorbidity clusters in autism spectrum disorders: an electronic health record time-series analysis. Pediatrics 133:1e54–63
[Google Scholar]
89. Lingren T, Chen P, Bochenek J, Doshi-Velez F, Manning-Courtney P et al. 2016. Electronic health record based algorithm to identify patients with autism spectrum disorder. PLOS ONE 11:7e0159621
[Google Scholar]
90. Wojnarski CM, Roselli EE, Idrees JJ, Zhu Y, Carnes TA et al. 2018. Machine-learning phenotypic classification of bicuspid aortopathy. J. Thorac. Cardiovasc. Surg. 155:2461–69.e4
[Google Scholar]
91. Guan W-J, Jiang M, Gao Y-H, Li H-M, Xu G et al. 2016. Unsupervised learning technique identifies bronchiectasis phenotypes with distinct clinical characteristics. Int. J. Tuberc. Lung Dis. 20:3402–10
[Google Scholar]
92. Ho JC, Ghosh J, Steinhubl SR, Stewart WF, Denny JC et al. 2014. Limestone: high-throughput candidate phenotype generation via tensor factorization. J. Biomed. Inform. 52:199–211
[Google Scholar]
93. Kale DC, Che Z, Bahadori MT, Li W, Liu Y, Wetzel R 2015. Causal phenotype discovery via deep networks. Proc. AMIA Annu. Symp., 14–18 Nov., San Francisco, Calif677–86 Bethesda, MD: Am. Med. Inform. Assoc.
[Google Scholar]
94. Pathak J, Wang J, Kashyap S, Basford M, Li R et al. 2011. Mapping clinical phenotype data elements to standardized metadata repositories and controlled terminologies: the eMERGE Network experience. J. Am. Med. Inform. Assoc. 18:4376–86
[Google Scholar]
95. Mo H, Thompson WK, Rasmussen LV, Pacheco JA, Jiang G et al. 2015. Desiderata for computable representations of electronic health records-driven phenotype algorithms. J. Am. Med. Inform. Assoc. 22:61220–30
[Google Scholar]
96. Wells QS, Veatch OJ, Fessel JP, Joon AY, Levinson RT et al. 2017. Genome-wide association and pathway analysis of left ventricular function after anthracycline exposure in adults. Pharmacogenet. Genom. 27:7247–54
[Google Scholar]
97. Köhler S, Vasilevsky NA, Engelstad M, Foster E, McMurry J et al. 2017. The Human Phenotype Ontology in 2017. Nucleic Acids Res 45:D865–76
[Google Scholar]
98. Denny JC, Ritchie MD, Basford MA, Pulley JM, Bastarache L et al. 2010. PheWAS: demonstrating the feasibility of a phenome-wide scan to discover gene-disease associations. Bioinform 26:91205–10
[Google Scholar]
99. Hebbring SJ, Schrodi SJ, Ye Z, Zhou Z, Page D, Brilliant MH 2013. A PheWAS approach in studying HLA-DRB1*1501. Genes Immun 14:3187–91
[Google Scholar]
100. Hebbring SJ, Rastegar-Mojarad M, Ye Z, Mayer J, Jacobson C, Lin S 2015. Application of clinical text data for phenome-wide association studies (PheWASs). Bioinform 31:121981–87
[Google Scholar]
101. Denny JC, Bastarache L, Ritchie MD, Carroll RJ, Zink R et al. 2013. Systematic comparison of phenome-wide association study of electronic medical record data and genome-wide association study data. Nat. Biotechnol. 31:121102–10
[Google Scholar]
102. Leader JB, Pendergrass SA, Verma A, Carey DJ, Hartzel DN et al. 2015. Contrasting association results between existing PheWAS phenotype definition methods and five validated electronic phenotypes. Proc. AMIA Annu. Symp., 14–18 Nov., San Francisco, Calif824–32 Bethesda, MD: Am. Med. Inform. Assoc.
[Google Scholar]
103. Verma A, Verma SS, Pendergrass SA, Crawford DC, Crosslin DR et al. 2016. eMERGE Phenome-Wide Association Study (PheWAS) identifies clinical associations and pleiotropy for stop-gain variants. BMC Med. Genom. 9:Suppl. 132
[Google Scholar]
104. Wei W-Q, Bastarache LA, Carroll RJ, Marlo JE, Osterman TJ et al. 2017. Evaluating phecodes, clinical classification software, and ICD-9-CM codes for phenome-wide association studies in the electronic health record. PLOS ONE 12:7e0175508
[Google Scholar]
105. Pathak J, Kiefer RC, Bielinski SJ, Chute CG 2012. Applying semantic web technologies for phenome-wide scan using an electronic health record linked Biobank. J. Biomed. Semant. 3:110
[Google Scholar]
106. Van Driest SL, Wells QS, Stallings S, Bush WS, Gordon A et al. 2016. Association of arrhythmia-related genetic variants with phenotypes documented in electronic medical records. JAMA 315:147–57
[Google Scholar]
107. Hebbring SJ 2014. The challenges, advantages and future of phenome-wide association studies. Immunology 141:2157–65
[Google Scholar]
108. Krapohl E, Euesden J, Zabaneh D, Pingault J-B, Rimfeld K et al. 2016. Phenome-wide analysis of genome-wide polygenic scores. Mol. Psychiatry 21:91188–93
[Google Scholar]
109. Gamazon ER, Wheeler HE, Shah KP, Mozaffari SV, Aquino-Michaels K et al. 2015. A gene-based association method for mapping traits using reference transcriptome data. Nat. Genet. 47:91091–98
[Google Scholar]
110. Mosley JD, Witte JS, Larkin EK, Bastarache L, Shaffer CM et al. 2016. Identifying genetically driven clinical phenotypes using linear mixed models. Nat. Commun. 7:11433
[Google Scholar]
111. Annas GJ, Elias S 2014. 23andMe and the FDA. N. Engl. J. Med. 370:11985–88
[Google Scholar]
112. Precis. Med. Initiat. Work. Group. 2015. The Precision Medicine Initiative Cohort Program—building a research foundation for 21st century medicine Work. Group Rep. Natl. Inst. Health Washington, DC: https://acd.od.nih.gov/documents/reports/DRAFT-PMI-WG-Report-9-11-2015-508.pdf
113. Tang PC, Ash JS, Bates DW, Overhage JM, Sands DZ 2006. Personal health records: definitions, benefits, and strategies for overcoming barriers to adoption. J. Am. Med. Inform. Assoc. 13:2121–26
[Google Scholar]
114. Roehrs A, da Costa CA, da Rosa Righi R, de Oliveira KSF 2017. Personal health records: a systematic literature review. J. Med. Internet Res. 19:1e13
[Google Scholar]
115. Gay V, Leijdekkers P 2015. Bringing health and fitness data together for connected health care: mobile apps as enablers of interoperability. J. Med. Internet Res. 17:11e260
[Google Scholar]
116. Gottesman O, Kuivaniemi H, Tromp G, Faucett WA, Li R et al. 2013. The Electronic Medical Records and Genomics (eMERGE) Network: past, present, and future. Genet. Med. 15:10761–71
[Google Scholar]
117. Jiang G, Kiefer RC, Rasmussen LV, Solbrig HR, Mo H et al. 2016. Developing a data element repository to support EHR-driven phenotype algorithm authoring and execution. J. Biomed. Inform. 62:232–42
[Google Scholar]
118. Weber GM, Murphy SN, McMurry AJ, Macfadden D, Nigrin DJ et al. 2009. The Shared Health Research Information Network (SHRINE): a prototype federated query tool for clinical data repositories. J. Am. Med. Inform. Assoc. 16:5624–30
[Google Scholar]
119. Overhage JM, Ryan PB, Reich CG, Hartzema AG, Stang PE 2012. Validation of a common data model for active safety surveillance research. J. Am. Med. Inform. Assoc. 19:154–60
[Google Scholar]
120. Klann JG, Abend A, Raghavan VA, Mandl KD, Murphy SN 2016. Data interchange using i2b2. J. Am. Med. Inform. Assoc. 23:5909–15
[Google Scholar]
121. Manrai AK, Funke BH, Rehm HL, Olesen MS, Maron BA et al. 2016. Genetic misdiagnoses and the potential for health disparities. N. Engl. J. Med. 375:7655–65
[Google Scholar]
122. Popejoy AB, Fullerton SM 2016. Genomics is failing on diversity. Nature 538:7624161–64
[Google Scholar]
123. Collins FS, Varmus H 2015. A new initiative on precision medicine. N. Engl. J. Med. 372:9793–95
[Google Scholar]
124. Pulley JM, Denny JC, Peterson JF, Bernard GR, Vnencak-Jones CL et al. 2012. Operational implementation of prospective genotyping for personalized medicine: the design of the Vanderbilt PREDICT project. Clin. Pharmacol. Ther. 92:187–95
[Google Scholar]
125. Inst. Med. 2013. Best Care at Lower Cost: The Path to Continuously Learning Health Care in America Washington, DC: Natl. Acad. Press
126. Starren J, Williams MS, Bottinger EP 2013. Crossing the omic chasm: a time for omic ancillary systems. JAMA 309:121237–38
[Google Scholar]
127. Kho AN, Rasmussen LV, Connolly JJ, Peissig PL, Starren J et al. 2013. Practical challenges in integrating genomic data into the electronic health record. Genet. Med. 15:10772–78
[Google Scholar]

/content/journals/10.1146/annurev-biodatasci-080917-013335

Defining Phenotypes from Clinical Data to Drive Genomic Research

Annual Review of Biomedical Data Science 1, 69 (2018); https://doi.org/10.1146/annurev-biodatasci-080917-013335

/content/journals/10.1146/annurev-biodatasci-080917-013335

Data & Media loading...

Article Type: Review Article

Most Cited Most Cited RSS feed

- Ethical Machine Learning in Healthcare
  
  Irene Y. Chen, Emma Pierson, Sherri Rose, Shalmali Joshi, Kadija Ferryman, and Marzyeh Ghassemi
  
  Vol. 4 (2021), pp. 123–144
- Spatial Metabolomics and Imaging Mass Spectrometry in the Age of Artificial Intelligence
  
  Theodore Alexandrov
  
  Vol. 3 (2020), pp. 61–87
- Advances in Electronic Phenotyping: From Rule-Based Definitions to Machine Learning Models
  
  Juan M. Banda, Martin Seneviratne, Tina Hernandez-Boussard, and Nigam H. Shah
  
  Vol. 1 (2018), pp. 53–68
- Computational Methods for Understanding Mass Spectrometry–Based Shotgun Proteomics Data
  
  Pavel Sinitcyn, Jan Daniel Rudolph, and Jürgen Cox
  
  Vol. 1 (2018), pp. 207–234
- RNA Sequencing Data: Hitchhiker's Guide to Expression Analysis
  
  Koen Van den Berge, Katharina M. Hembach, Charlotte Soneson, Simone Tiberi, Lieven Clement, Michael I. Love, Rob Patro, and Mark D. Robinson
  
  Vol. 2 (2019), pp. 139–173
- Deep Learning in Biomedical Data Science
  
  Pierre Baldi
  
  Vol. 1 (2018), pp. 181–205
- From Tissues to Cell Types and Back: Single-Cell Gene Expression Analysis of Tissue Architecture
  
  Xi Chen, Sarah A. Teichmann, and Kerstin B. Meyer
  
  Vol. 1 (2018), pp. 29–51
- Using Phecodes for Research with the Electronic Health Record: From PheWAS to PheRS
  
  Lisa Bastarache
  
  Vol. 4 (2021), pp. 1–19
- Visualization of Biomedical Data
  
  Seán I. O'Donoghue, Benedetta Frida Baldi, Susan J. Clark, Aaron E. Darling, James M. Hogan, Sandeep Kaur, Lena Maier-Hein, Davis J. McCarthy, William J. Moore, Esther Stenau, Jason R. Swedlow, Jenny Vuong, and James B. Procter
  
  Vol. 1 (2018), pp. 275–304
- Alignment-Free Sequence Analysis and Applications
  
  Jie Ren, Xin Bai, Yang Young Lu, Kujin Tang, Ying Wang, Gesine Reinert, and Fengzhu Sun
  
  Vol. 1 (2018), pp. 93–114
More Less

Annual Review of Biomedical Data Science

Volume 1, 2018

Review Article

Free

Defining Phenotypes from Clinical Data to Drive Genomic Research

Abstract

Most Read This Month

Most Cited Most Cited RSS feed