Modern Clinical Text Mining: A Guide and Review

Bethany Percha

doi:10.1146/annurev-biodatasci-030421-030931

Annual Review of Biomedical Data Science

Volume 4, 2021

Review Article

Free

Modern Clinical Text Mining: A Guide and Review

Bethany Percha¹
View Affiliations Hide Affiliations

Affiliations: Department of Medicine and Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY 10025, USA; email: [email protected]
Vol. 4:165-187 (Volume publication date July 2021) https://doi.org/10.1146/annurev-biodatasci-030421-030931
First published as a Review in Advance on May 26, 2021
Copyright © 2021 by Annual Reviews. All rights reserved

Abstract

Electronic health records (EHRs) are becoming a vital source of data for healthcare quality improvement, research, and operations. However, much of the most valuable information contained in EHRs remains buried in unstructured text. The field of clinical text mining has advanced rapidly in recent years, transitioning from rule-based approaches to machine learning and, more recently, deep learning. With new methods come new challenges, however, especially for those new to the field. This review provides an overview of clinical text mining for those who are encountering it for the first time (e.g., physician researchers, operational analytics teams, machine learning scientists from other domains). While not a comprehensive survey, this review describes the state of the art, with a particular focus on new tasks and methods developed over the past few years. It also identifies key barriers between these remarkable technical advances and the practical realities of implementation in health systems and in industry.

Keyword(s): clinical text, electronic health record, machine learning, natural language processing, text mining

Article metrics loading...

/content/journals/10.1146/annurev-biodatasci-030421-030931

2021-07-20

2024-05-05

Full text loading...

/deliver/fulltext/biodatasci/4/1/annurev-biodatasci-030421-030931.html?itemId=/content/journals/10.1146/annurev-biodatasci-030421-030931&mimeType=html&fmt=ahah

Literature Cited

1.
Roberts A. 2017. Language, structure, and reuse in the electronic health record. AMA J. Ethics 19:281–88
[Google Scholar]
2.
Hersh WR, Weiner MG, Embi PJ, Logan JR, Payne PR et al. 2013. Caveats for the use of operational electronic health record data in comparative effectiveness research. Med. Care 51:S30–37
[Google Scholar]
3.
Rosenbloom ST, Denny JC, Xu H, Lorenzi N, Stead WW, Johnson KB. 2011. Data from clinical notes: a perspective on the tension between structure and flexible documentation. J. Am. Med. Inf. Assoc. 18:181–86
[Google Scholar]
4.
Hatef E, Rouhizadeh M, Tia I, Lasser E, Hill-Briggs F et al. 2019. Assessing the availability of data on social and behavioral determinants in structured and unstructured electronic health records: a retrospective analysis of a multilevel health care system. JMIR Med. Inform. 7:e13802
[Google Scholar]
5.
Koleck TA, Dreisbach C, Bourne PE, Bakken S. 2019. Natural language processing of symptoms documented in free-text narratives of electronic health records: a systematic review. J. Am. Med. Inform. Assoc. 26:364–79
[Google Scholar]
6.
Topol EJ. 2019. High-performance medicine: the convergence of human and artificial intelligence. Nat. Med. 25:44–56
[Google Scholar]
7.
Rajkomar A, Dean J, Kohane I. 2019. Machine learning in medicine. N. Engl. J. Med. 380:1347–58
[Google Scholar]
8.
Kreimeyer K, Foster M, Pandey A, Arya N, Halford G et al. 2017. Natural language processing systems for capturing and standardizing unstructured clinical information: a systematic review. J. Biomed. Inform. 73:14–29
[Google Scholar]
9.
Dalianis H. 2018. Clinical Text Mining: Secondary Use of Electronic Patient Records Cham, Switz: Springer Int.
10.
Friedman C, Alderson PO, Austin JH, Cimino JJ, Johnson SB. 1994. A general natural-language text processor for clinical radiology. J. Am. Med. Inform. Assoc. 1:161–74
[Google Scholar]
11.
Savova GK, Masanz JJ, Ogren PV, Zheng J, Sohn S et al. 2010. Mayo clinical text analysis and knowledge extraction system (cTAKES): architecture, component evaluation and applications. J. Am. Med. Inform. Assoc. 17:507–13
[Google Scholar]
12.
Guzman B, Metzger I, Aphinyanaphongs Y, Grover H et al. 2020. Assessment of Amazon Comprehend Medical: medication information extraction. arXiv:2002.00481 [cs.CL]
13.
Wei WQ, Teixeira PL, Mo H, Cronin RM, Warner JL, Denny JC. 2016. Combining billing codes, clinical notes, and medications from electronic health records provides superior phenotyping performance. J. Am. Med. Inform. Assoc. 23:e20–27
[Google Scholar]
14.
Liao KP, Cai T, Savova GK, Murphy SN, Karlson EW et al. 2015. Development of phenotype algorithms using electronic medical records and incorporating natural language processing. BMJ 350:h1885
[Google Scholar]
15.
Marafino BJ, Park M, Davies JM, Thombley R, Luft HS et al. 2018. Validation of prediction models for critical care outcomes using natural language processing of electronic health record data. JAMA Netw. Open 1:e185097
[Google Scholar]
16.
Weissman GE, Hubbard RA, Ungar LH, Harhay MO, Greene CS et al. 2018. Inclusion of unstructured clinical text improves early prediction of death or prolonged ICU stay. Crit. Care Med. 46:1125–32
[Google Scholar]
17.
Wang Y, Wang L, Rastegar-Mojarad M, Moon S, Shen F et al. 2018. Clinical information extraction applications: a literature review. J. Biomed. Inform. 77:34–49
[Google Scholar]
18.
Castro SM, Tseytlin E, Medvedeva O, Mitchell K, Visweswaran S et al. 2017. Automated annotation and classification of BI-RADS assessment from radiology reports. J. Biomed. Inform. 69:177–87
[Google Scholar]
19.
Patterson O, Hurdle JF. 2011. Document clustering of clinical narratives: a systematic study of clinical sublanguages. AMIA Annu. Symp. Proc. 2011.1099–107
[Google Scholar]
20.
Mujtaba G, Shuib L, Idris N, Hoo WL, Raj RG et al. 2019. Clinical text classification research trends: systematic literature review and open issues. Expert Syst. Appl. 116:494–520
[Google Scholar]
21.
Shin B, Chokshi FH, Lee T, Choi JD. 2017. Classification of radiology reports using neural attention models. 2017 International Joint Conference on Neural Networks (IJCNN)4363–70 New York: IEEE
[Google Scholar]
22.
Wu S, Roberts K, Datta S, Du J, Ji Z et al. 2020. Deep learning in clinical natural language processing: a methodical review. J. Am. Med. Inform. Assoc. 27:457–70
[Google Scholar]
23.
Spasic I, Nenadic G. 2020. Clinical text data in machine learning: systematic review. JMIR Med. Inform. 8:e17984
[Google Scholar]
24.
Bird S, Klein E, Loper E. 2009. Natural Language Processing with Python: Analyzing Text with the Natural Language Toolkit Sebastapol, CA: O'Reilly Media:
25.
Manning CD, Surdeanu M, Bauer J, Finkel J, Bethard SJ, McClosky D 2014. The Stanford CoreNLP natural language processing toolkit. Association for Computational Linguistics (ACL) System Demonstrations K Bontcheva, J Zhu 55–60 Stroudburg, PA: Assoc. Comput. Linguist.
[Google Scholar]
26.
Qi P, Zhang Y, Zhang Y, Bolton J, Manning CD 2020. Stanza: a Python natural language processing toolkit for many human languages. Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics: System Demonstrations A Celikyilmaz, T-H Wen 101–8 Stroudburg, PA: Assoc. Comput. Linguist.
[Google Scholar]
27.
Neumann M, King D, Beltagy I, Ammar W 2019. ScispaCy: fast and robust models for biomedical natural language processing. Proceedings of the 18th BioNLP Workshop and Shared Task D Demner-Fushman, KB Cohen, S Ananiadou, J Tsujii 319–27 Stroudburg, PA: Assoc. Comput. Linguist.
[Google Scholar]
28.
Řehůřek R, Sojka P. 2010. Software framework for topic modelling with large corpora. Proceedings of the LREC 2010 Workshop on New Challenges for NLP Frameworks45–50 Valletta, Malta: ELRA http://is.muni.cz/publication/884893/en
[Google Scholar]
29.
Devlin J, Chang MW, Lee K, Toutanova K. 2018. BERT: pre-training of deep bidirectional transformers for language understanding. arXiv:1810.04805 [cs.CL]
30.
Aronson AR, Lang FM. 2010. An overview of MetaMap: historical perspective and recent advances. J. Am. Med. Inform. Assoc. 17:229–36
[Google Scholar]
31.
Demner-Fushman D, Rogers WJ, Aronson AR. 2017. MetaMap Lite: an evaluation of a new Java implementation of MetaMap. J. Am. Med. Inform. Assoc. 24:841–44
[Google Scholar]
32.
Zhang Y, Zhang Y, Qi P, Manning CD, Langlotz CP. 2020. Biomedical and clinical English model packages in the Stanza Python NLP library. arXiv:2007.14640 [cs.CL]
33.
Leaman R, Islamaj Doğan R, Lu Z 2013. DNorm: disease name normalization with pairwise learning to rank. Bioinformatics 29:2909–17
[Google Scholar]
34.
Alsentzer E, Murphy JR, Boag W, Weng WH, Jin D et al. 2019. Publicly available clinical BERT embeddings. arXiv:1904.03323 [cs.CL]
35.
Huang K, Altosaar J, Ranganath R. 2019. ClinicalBERT: Modeling clinical notes and predicting hospital readmission. arXiv:1904.05342 [cs.CL]
36.
Bodenreider O. 2004. The unified medical language system (UMLS): integrating biomedical terminology. Nucleic Acids Res. 32:D267–70
[Google Scholar]
37.
Wu Y, Jiang M, Xu J, Zhi D, Xu H. 2017. Clinical named entity recognition using deep learning models. AMIA Annu. Symp. Proc 20171812–19
[Google Scholar]
38.
Boag W, Sergeeva E, Kulshreshtha S, Szolovits P, Rumshisky A, Naumann T. 2018. CliNER 2.0: accessible and accurate clinical concept extraction. arXiv:1803.02245 [cs.CL]
39.
Goeuriot L, Suominen H, Kelly L, Miranda-Escalada A, Krallinger M et al. 2020. Overview of the CLEF eHealth Evaluation Lab 2020. Experimental IR Meets Multilinguality, Multimodality, and Interaction: Proceedings of the 11th International Conference of the CLEF Association (CLEF 2020)255–71 Cham, Switz: Springer
[Google Scholar]
40.
Luo YF, Sun W, Rumshisky A. 2019. MCN: a comprehensive corpus for medical concept normalization. J. Biomed. Inform. 92:103132
[Google Scholar]
41.
Pradhan S, Elhadad N, South BR, Martinez D, Christensen L et al. 2015. Evaluating the state of the art in disorder recognition and normalization of the clinical narrative. J. Am. Med. Inform. Assoc. 22:143–54
[Google Scholar]
42.
Yang X, Bian J, Wu Y. 2018. Detecting medications and adverse drug events in clinical notes using recurrent neural networks. Proc. Mach. Learn. Res. 90:1–6
[Google Scholar]
43.
Jagannatha AN, Yu H. 2016. Structured prediction models for RNN based sequence labeling in clinical text. Proceedings of the Conference on Empirical Methods in Natural Language Processing856–65 Stroudsburg, PA: Assoc. Comput. Linguist.
[Google Scholar]
44.
Liu Z, Yang M, Wang X, Chen Q, Tang B et al. 2017. Entity recognition from clinical texts via recurrent neural network. BMC Med. Inform. Decis. Making 17:67
[Google Scholar]
45.
Dernoncourt F, Lee JY, Uzuner O, Szolovits P. 2017. De-identification of patient notes with recurrent neural networks. J. Am. Med. Inform. Assoc. 24:596–606
[Google Scholar]
46.
Uzuner Ö, South BR, Shen S, DuVall SL 2011. 2010 i2b2/va challenge on concepts, assertions, and relations in clinical text. J. Am. Med. Inform. Assoc. 18:552–56
[Google Scholar]
47.
Jung K, LePendu P, Iyer S, Bauer-Mehren A, Percha B, Shah NH. 2015. Functional evaluation of out-of-the-box text-mining tools for data-mining tasks. J. Am. Med. Inform. Assoc. 22:121–31
[Google Scholar]
48.
Quimbaya AP, Múnera AS, Rivera RAG, Rodríguez JCD, Velandia OMM et al. 2016. Named entity recognition over electronic health records through a combined dictionary-based approach. Procedia Comput. Sci. 100:55–61
[Google Scholar]
49.
Akbik A, Blythe D, Vollgraf R 2018. Contextual string embeddings for sequence labeling. Proceedings of the 27th International Conference on Computational Linguistics EM Bender, L Derczynski, P Isabelle 1638–49 Stroudsburg, PA: Assoc. Comput. Linguist.
[Google Scholar]
50.
Lample G, Ballesteros M, Subramanian S, Kawakami K, Dyer C 2016. Neural architectures for named entity recognition. Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies K Knight, A Nenkova, O Rambow 260–70 Stroudsburg, PA: Assoc. Comput. Linguist.
[Google Scholar]
51.
Huang Z, Xu W, Yu K 2015. Bidirectional LSTM-CRF models for sequence tagging. arXiv:1508.01991 [cs.CL]
52.
Settles B. 2004. Biomedical named entity recognition using conditional random fields and rich feature sets. Proceedings of the International Joint Workshop on Natural Language Processing in Biomedicine and its Applications (NLPBA/BioNLP)107–10 Stroudsburg, PA: Assoc. Comput. Linguist.
[Google Scholar]
53.
Wang Y, Liu S, Afzal N, Rastegar-Mojarad M, Wang L et al. 2018. A comparison of word embeddings for the biomedical natural language processing. J. Biomed. Inform. 87:12–20
[Google Scholar]
54.
Hassanpour S, Langlotz CP. 2016. Information extraction from multi-institutional radiology reports. Artif. Intel. Med. 66:29–39
[Google Scholar]
55.
Chalapathy R, Borzeshi EZ, Piccardi M. 2016. Bidirectional LSTM-CRF for clinical concept extraction. arXiv:1611.08373 [cs.CL]
56.
Unanue IJ, Borzeshi EZ, Piccardi M. 2017. Recurrent neural networks with specialized word embeddings for health-domain named-entity recognition. J. Biomed. Inform. 76:102–9
[Google Scholar]
57.
Tang B, Cao H, Wu Y, Jiang M, Xu H. 2012. Clinical entity recognition using structural support vector machines with rich features. Proceedings of the ACM Sixth International Workshop on Data and Text Mining in Biomedical Informatics13–20 New York: Assoc. Comput. Linguist.
[Google Scholar]
58.
Lee J, Yoon W, Kim S, Kim D, Kim S et al. 2020. BioBERT: a pre-trained biomedical language representation model for biomedical text mining. Bioinformatics 36:1234–40
[Google Scholar]
59.
Luo G, Huang X, Lin CY, Nie Z 2015. Joint entity recognition and disambiguation. Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing L Màrquez, C Callison-Burch, J Su 879–88 Stroudsburg, PA: Assoc. Comput. Linguist.
[Google Scholar]
60.
Jurafsky D, Martin JH. 2019. Speech and Language Processing Book Draft, , 3rd ed.. https://web.stanford.edu/∼jurafsky/slp3
61.
Leaman R, Khare R, Lu Z. 2015. Challenges in clinical natural language processing for automated disorder normalization. J. Biomed. Inform. 57:28–37
[Google Scholar]
62.
Soysal E, Wang J, Jiang M, Wu Y, Pakhomov S et al. 2018. CLAMP: a toolkit for efficiently building customized clinical natural language processing pipelines. J. Am. Med. Inform. Assoc. 25:331–36
[Google Scholar]
63.
Chapman WW, Bridewell W, Hanbury P, Cooper GF, Buchanan BG. 2001. A simple algorithm for identifying negated findings and diseases in discharge summaries. J. Biomed. Inform. 34:301–10
[Google Scholar]
64.
Afshar M, Dligach D, Sharma B, Cai X, Boyda J et al. 2019. Development and application of a high throughput natural language processing architecture to convert all clinical documents in a clinical data warehouse into standardized medical vocabularies. J. Am. Med. Inform. Assoc. 26:1364–69
[Google Scholar]
65.
Suominen H, Salanterä S, Velupillai S, Chapman WW, Savova G et al. 2013. Overview of the ShARe/CLEF eHealth Evaluation Lab 2013. International Conference of the Cross-Language Evaluation Forum for European Languages P Forner, H Muller, R Paredes, P Rosso, B Stein 212–31 Berlin: Springer
[Google Scholar]
66.
Li H, Chen Q, Tang B, Wang X, Xu H et al. 2017. CNN-based ranking for biomedical entity normalization. BMC Bioinform. 18:79–86
[Google Scholar]
67.
Luo YF, Sun W, Rumshisky A. 2019. A hybrid normalization method for medical concepts in clinical narrative using semantic matching. AMIA Summits Transl. Sci. Proc. 2019:732–40
[Google Scholar]
68.
Turchin A, Kolatkar NS, Grant RW, Makhni EC, Pendergrass ML, Einbinder JS. 2006. Using regular expressions to abstract blood pressure and treatment intensification information from the text of physician notes. J. Am. Med. Inform. Assoc. 13:691–95
[Google Scholar]
69.
Hao T, Liu H, Weng C. 2016. Valx: a system for extracting and structuring numeric lab test comparison statements from text. Methods Inform. Med. 55:266–75
[Google Scholar]
70.
Xie T, Zhen Y, Tavakoli M, Hundley G, Ge Y. 2020. A deep-learning based system for accurate extraction of blood pressure data in clinical narratives. AMIA Summits Transl. Sci. Proc. 2020:703–9
[Google Scholar]
71.
Denny JC, Spickard A 3rd, Johnson KB, Peterson NB, Peterson JF, Miller RA 2009. Evaluation of a method to identify and categorize section headers in clinical documents. J. Am. Med. Inform. Assoc. 16:806–15
[Google Scholar]
72.
Pomares-Quimbaya A, Kreuzthaler M, Schulz S. 2019. Current approaches to identify sections within clinical narratives from electronic health records: a systematic review. BMC Med. Res. Methodol. 19:155
[Google Scholar]
73.
Mikolov T, Sutskever I, Chen K, Corrado GS, Dean J 2013. Distributed representations of words and phrases and their compositionality. Advances in Neural Information Processing Systems CJC Burges, L Bottou, M Welling, Z Ghahramani, KQ Weinberger 3111–19 https://papers.nips.cc/paper/2013/hash/9aa42b31882ec039965f3c4923ce901b-Abstract.html
[Google Scholar]
74.
Pennington J, Socher R, Manning CD. 2014. GloVe: global vectors for word representation. Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP) A Moschitti, B Pang, W Daelemans 1532–43 Stroudsburg, PA: Assoc. Comput. Linguist.
[Google Scholar]
75.
Wu Y, Xu J, Jiang M, Zhang Y, Xu H. 2015. A study of neural word embeddings for named entity recognition in clinical text. AMIA Annu. Symp. Proc. 2015:1326–33
[Google Scholar]
76.
Wu Y, Xu J, Zhang Y, Xu H 2015. Clinical abbreviation disambiguation using neural word embeddings. Proceedings of BioNLP 15 KB Cohen, D Demner-Fushman, S Ananiadou, J Tsujii 171–76 Stroudsburg, PA: Assoc. Comput. Linguist.
[Google Scholar]
77.
Percha B, Zhang Y, Bozkurt S, Rubin D, Altman RB, Langlotz CP. 2018. Expanding a radiology lexicon using contextual patterns in radiology reports. J. Am. Med. Inform. Assoc. 25:679–85
[Google Scholar]
78.
Fan Y, Pakhomov S, McEwan R, Zhao W, Lindemann E, Zhang R. 2019. Using word embeddings to expand terminology of dietary supplements on clinical notes. JAMIA Open 2:246–53
[Google Scholar]
79.
Lastra-Díaz JJ, Goikoetxea J, Taieb MAH, García-Serrano A, Aouicha MB, Agirre E 2019. A reproducible survey on word embeddings and ontology-based methods for word similarity: linear combinations outperform the state of the art. Eng. Appl. Artif. Intel. 85:645–65
[Google Scholar]
80.
Beam A, Kompa B, Schmaltz A, Fried I, Weber G et al. 2020. Clinical concept embeddings learned from massive sources of multimodal medical data. Pac. Symp. Biocomput. 25:295–306
[Google Scholar]
81.
Baumel T, Nassour-Kassis J, Cohen R, Elhadad M, Elhadad N. 2017. Multi-label classification of patient notes a case study on ICD code assignment. arXiv:1709.09587 [cs.CL]
82.
Banerjee I, Chen MC, Lungren MP, Rubin DL. 2018. Radiology report annotation using intelligent word embeddings: applied to multi-institutional chest CT cohort. J. Biomed. Inform. 77:11–20
[Google Scholar]
83.
Miotto R, Li L, Kidd BA, Dudley JT. 2016. Deep patient: an unsupervised representation to predict the future of patients from the electronic health records. Sci. Rep. 6:26094
[Google Scholar]
84.
Turney PD, Pantel P. 2010. From frequency to meaning: vector space models of semantics. J. Artif. Intel. Res. 37:141–88
[Google Scholar]
85.
Kalyan KS, Sangeetha S. 2020. SECNLP: a survey of embeddings in clinical natural language processing. J. Biomed. Inform. 101:103323
[Google Scholar]
86.
Peng Y, Yan S, Lu Z 2019. Transfer learning in biomedical natural language processing: an evaluation of BERT and ELMo on ten benchmarking datasets. Proceedings of the 18th BioNLP Workshop and Shared Task D Demner-Fushman, KB Cohen, S Ananiadou, J Tsujii 58–65 Stroudsburg, PA: Assoc. Comput. Linguist.
[Google Scholar]
87.
Johnson AE, Pollard TJ, Shen L, Li-Wei HL, Feng M et al. 2016. MIMIC-III, a freely accessible critical care database. Sci. Data 3:160035
[Google Scholar]
88.
Si Y, Wang J, Xu H, Roberts K. 2019. Enhancing clinical concept extraction with contextual embeddings. J. Am. Med. Inform. Assoc. 26:1297–304
[Google Scholar]
89.
Li F, Jin Y, Liu W, Rawat BPS, Cai P, Yu H. 2019. Fine-tuning bidirectional encoder representations from transformers (BERT)–based models on large-scale electronic health record notes: an empirical study. JMIR Med. Inform. 7:e14830
[Google Scholar]
90.
Xu D, Gopale M, Zhang J, Brown K, Begoli E, Bethard S. 2020. Unified medical language system resources improve sieve-based generation and bidirectional encoder representations from transformers (BERT)–based ranking for concept normalization. J. Am. Med. Inform. Assoc. 27:101510–19
[Google Scholar]
91.
Miotto R, Percha BL, Glicksberg BS, Lee HC, Cruz L et al. 2020. Identifying acute low back pain episodes in primary care practice from clinical notes: observational study. JMIR Med. Inform. 8:e16878
[Google Scholar]
92.
Hassanpour S, Langlotz CP, Amrhein TJ, Befera NT, Lungren MP. 2017. Performance of a machine learning classifier of knee MRI reports in two large academic radiology practices: a tool to estimate diagnostic yield. Am. J. Roentgenol. 208:750–53
[Google Scholar]
93.
Wang Y, Sohn S, Liu S, Shen F, Wang L et al. 2019. A clinical text classification paradigm using weak supervision and deep representation. BMC Med. Informat. Decis. Making 19:1
[Google Scholar]
94.
Stanfill MH, Williams M, Fenton SH, Jenders RA, Hersh WR. 2010. A systematic literature review of automated clinical coding and classification systems. J. Am. Med. Inform. Assoc. 17:646–51
[Google Scholar]
95.
Manning CD, Schütze H, Raghavan P. 2008. Introduction to Information Retrieval Cambridge, UK: Cambridge Univ. Press
96.
Shao Y, Taylor S, Marshall N, Morioka C, Zeng-Treitler Q. 2018. Clinical text classification with word embedding features versus bag-of-words features. 2018 IEEE International Conference on Big Data (Big Data)2874–78 New York: IEEE
[Google Scholar]
97.
Buchan K, Filannino M, Uzuner Ö 2017. Automatic prediction of coronary artery disease from clinical narratives. J. Biomed. Inform. 72:23–32
[Google Scholar]
98.
Kocbek S, Cavedon L, Martinez D, Bain C, Mac Manus C et al. 2016. Text mining electronic hospital records to automatically classify admissions against disease: measuring the impact of linking data sources. J. Biomed. Inform. 64:158–67
[Google Scholar]
99.
Garla VN, Brandt C. 2012. Ontology-guided feature engineering for clinical text classification. J. Biomed. Inform. 45:992–98
[Google Scholar]
100.
Zou H, Hastie T. 2005. Regularization and variable selection via the elastic net. J. R. Stat. Soc. B 67:301–20
[Google Scholar]
101.
Marafino BJ, Boscardin WJ, Dudley RA. 2015. Efficient and sparse feature selection for biomedical text classification via the elastic net: application to ICU risk stratification from nursing notes. J. Biomed. Inform. 54:114–20
[Google Scholar]
102.
Lucini FR, Fogliatto FS, da Silveira GJ, Neyeloff JL, Anzanello MJ et al. 2017. Text mining approach to predict hospital admissions using early medical records from the emergency department. Int. J. Med. Inform. 100:1–8
[Google Scholar]
103.
Kavuluru R, Rios A, Lu Y. 2015. An empirical evaluation of supervised learning approaches in assigning diagnosis codes to electronic medical records. Artif. Intel. Med. 65:155–66
[Google Scholar]
104.
Rios A, Kavuluru R. 2015. Convolutional neural networks for biomedical text classification: application in indexing biomedical articles. Proceedings of the 6th ACM Conference on Bioinformatics, Computational Biology and Health Informatics258–67 New York: Assoc. Comput. Mach.
[Google Scholar]
105.
Mullenbach J, Wiegreffe S, Duke J, Sun J, Eisenstein J 2018. Explainable prediction of medical codes from clinical text. Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Vol. 1 M Walker, H Ji, A Stent 1101–11 Stroudsburg, PA: Assoc. Comput. Linguist.
[Google Scholar]
106.
Chen MC, Ball RL, Yang L, Moradzadeh N, Chapman BE et al. 2018. Deep learning to classify radiology free-text reports. Radiology 286:845–52
[Google Scholar]
107.
Yao L, Mao C, Luo Y. 2019. Clinical text classification with rule-based features and knowledge-guided convolutional neural networks. BMC Med. Inform. Decis. Making 19:71
[Google Scholar]
108.
Gehrmann S, Dernoncourt F, Li Y, Carlson ET, Wu JT et al. 2018. Comparing deep learning and concept extraction based methods for patient phenotyping from clinical narratives. PLOS ONE 13:e0192360
[Google Scholar]
109.
Uzuner O, Bodnari A, Shen S, Forbush T, Pestian J, South BR. 2012. Evaluating the state of the art in coreference resolution for electronic medical records. J. Am. Med. Inform. Assoc. 19:786–91
[Google Scholar]
110.
Sun W, Rumshisky A, Uzuner O. 2013. Evaluating temporal relations in clinical text: 2012 i2b2 challenge. J. Am. Med. Inform. Assoc. 20:806–13
[Google Scholar]
111.
Bethard S, Derczynski L, Savova G, Pustejovsky J, Verhagen M 2015. Semeval-2015 task 6: clinical tempeval. Proceedings of the 9th International Workshop on Semantic Evaluation (SemEval 2015) P Nakov, T Zesch, D Cer, D Jurgens 806–14 Stroudsburg, PA: Assoc. Comput. Linguist.
[Google Scholar]
112.
Rumshisky A, Ghassemi M, Naumann T, Szolovits P, Castro V et al. 2016. Predicting early psychiatric readmission with natural language processing of narrative discharge summaries. Transl. Psychiatry 6:e921
[Google Scholar]
113.
Agarwal A, Baechle C, Behara R, Zhu X. 2017. A natural language processing framework for assessing hospital readmissions for patients with COPD. IEEE J. Biomed. Health Inform. 22:588–96
[Google Scholar]
114.
Young IJB, Luz S, Lone N 2019. A systematic review of natural language processing for classification tasks in the field of incident reporting and adverse event analysis. Int. J. Med. Inform. 132:103971
[Google Scholar]
115.
Osborne JD, Wyatt M, Westfall AO, Willig J, Bethard S, Gordon G 2016. Efficient identification of nationally mandated reportable cancer cases using natural language processing and machine learning. J. Am. Med. Inform. Assoc. 23:1077–84
[Google Scholar]
116.
Chen W, Huang Y, Boyle B, Lin S 2016. The utility of including pathology reports in improving the computational identification of patients. J. Pathol. Inform. 7:46
[Google Scholar]
117.
Venkataraman GR, Pineda AL, Bear Don't Walk IV OJ, Zehnder AM, Ayyar S et al. 2020. FasTag: automatic text classification of unstructured medical narratives. PLOS ONE 15:e0234647
[Google Scholar]
118.
Roysden N, Wright A. 2015. Predicting health care utilization after behavioral health referral using natural language processing and machine learning. AMIA Annu. Symp. Proc. 2015:2063–72
[Google Scholar]
119.
Sabbir A, Jimeno-Yepes A, Kavuluru R. 2017. Knowledge-based biomedical word sense disambiguation with neural concept embeddings. 2017 IEEE 17th International Conference on Bioinformatics and Bioengineering (BIBE)163–70 New York: IEEE
[Google Scholar]
120.
Fries J, Wu S, Ratner A, Ré C. 2017. SwellShark: a generative model for biomedical named entity recognition without labeled data. arXiv:1704.06360 [cs.CL]
121.
Callahan A, Fries JA, Ré C, Huddleston JI, Giori NJ et al. 2019. Medical device surveillance with electronic health records. NPJ Digital Med. 2:94
[Google Scholar]
122.
Chen Y, Lasko TA, Mei Q, Denny JC, Xu H. 2015. A study of active learning methods for named entity recognition in clinical text. J. Biomed. Inform. 58:11–18
[Google Scholar]
123.
Kholghi M, Sitbon L, Zuccon G, Nguyen A. 2016. Active learning: a step towards automating medical concept extraction. J. Am. Med. Inform. Assoc. 23:289–96
[Google Scholar]
124.
Meystre SM, Thibault J, Shen S, Hurdle JF, South BR. 2010. Textractor: a hybrid system for medications and reason for their prescription extraction from clinical text documents. J. Am. Med. Inform. Assoc. 17:559–62
[Google Scholar]
125.
Sahu S, Anand A, Oruganty K, Gattu M 2016. Relation extraction from clinical texts using domain invariant convolutional neural network. Proceedings of the 15th Workshop on Biomedical Natural Language Processing KB Cohen, D Demner-Fushman, S Ananiadou, J Tsujii 206–15 Stroudsburg, PA: Assoc. Comput. Linguist.
[Google Scholar]
126.
Luo Y, Cheng Y, Uzuner Ö, Szolovits P, Starren J 2018. Segment convolutional neural networks (Seg-CNNs) for classifying relations in clinical notes. J. Am. Med. Inform. Assoc. 25:93–98
[Google Scholar]
127.
Li Z, Yang Z, Shen C, Xu J, Zhang Y, Xu H. 2019. Integrating shortest dependency path and sentence sequence into a deep learning framework for relation extraction in clinical text. BMC Med. Inform. Decis. Making 19:22
[Google Scholar]
128.
Rink B, Harabagiu S, Roberts K. 2011. Automatic extraction of relations between medical concepts in clinical texts. J. Am. Med. Inform. Assoc. 18:594–600
[Google Scholar]
129.
Munkhdalai T, Liu F, Yu H. 2018. Clinical relation extraction toward drug safety surveillance using electronic health record narratives: classical learning versus deep learning. JMIR Public Health Surveill. 4:e29
[Google Scholar]
130.
Henry S, Buchan K, Filannino M, Stubbs A, Uzuner O. 2020. 2018 n2c2 shared task on adverse drug events and medication extraction in electronic health records. J. Am. Med. Inform. Assoc. 27:3–12
[Google Scholar]
131.
Alfattni G, Peek N, Nenadic G. 2020. Extraction of temporal relations from clinical free text: a systematic review of current approaches. J. Biomed. Inform. 108:103488
[Google Scholar]
132.
Styler WF IV, Bethard S, Finan S, Palmer M, Pradhan S et al. 2014. Temporal annotation in the clinical domain. Trans. Assoc. Comput. Linguist. 2:143–54
[Google Scholar]
133.
Tang B, Wu Y, Jiang M, Chen Y, Denny JC, Xu H. 2013. A hybrid system for temporal information extraction from clinical text. J. Am. Med. Inform. Assoc. 20:828–35
[Google Scholar]
134.
Dligach D, Miller T, Lin C, Bethard S, Savova G 2017. Neural temporal relation extraction. Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics, Vol. 2 M Lapata, P Blunsom, A Koller 746–51 Stroudsburg, PA: Assoc. Comput. Linguist.
[Google Scholar]
135.
Tourille J, Ferret O, Neveol A, Tannier X. 2017. Neural architecture for temporal relation extraction: a Bi-LSTM approach for detecting narrative containers. Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics R Barzilay, M-Y Kan , Vol. 2224–30 Stroudsburg, PA: Assoc. Comput. Linguist.
[Google Scholar]
136.
Lin C, Miller T, Dligach D, Bethard S, Savova G 2019. A BERT-based universal model for both within-and cross-sentence clinical temporal relation extraction. Proceedings of the 2nd Clinical Natural Language Processing Workshop A Rumshisky, K Roberts, S Bethard, T Naumann 65–71 Stroudsburg, PA: Assoc. Comput. Linguist.
[Google Scholar]
137.
Romanov A, Shivade C 2018. Lessons from natural language inference in the clinical domain. Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing E Riloff, D Chiang, J Hockenmaier, J Tsujii 1586–96 Stroudsburg, PA: Assoc. Comput. Linguist.
[Google Scholar]
138.
Lee LH, Lu Y, Chen PH, Lee PL, Shyu KK 2019. NCUEE at MEDIQA 2019: medical text inference using ensemble BERT-BiLSTM-attention model. Proceedings of the 18th BioNLP Workshop and Shared Task D Demner-Fushman, KB Cohen, S Ananiadou, J Tsujii 528–32 Stroudsburg, PA: Assoc. Comput. Linguist.
[Google Scholar]
139.
Lu M, Fang Y, Yan F, Li M. 2019. Incorporating domain knowledge into natural language inference on clinical texts. IEEE Access 7:57623–32
[Google Scholar]
140.
Sharma S, Santra B, Jana A, Santosh T, Ganguly N, Goyal P 2019. Incorporating domain knowledge into medical NLI using knowledge graphs. arXiv:1909.00160 [cs.CL]
141.
Chapman WW, Nadkarni PM, Hirschman L, D'avolio LW, Savova GK, Uzuner O 2011. Overcoming barriers to NLP for clinical text: the role of shared tasks and the need for additional creative solutions. J. Am. Med. Inform. Assoc. 18:5540–43
[Google Scholar]
142.
Ford E, Carroll JA, Smith HE, Scott D, Cassell JA. 2016. Extracting information from the text of electronic medical records to improve case detection: a systematic review. J. Am. Med. Inform. Assoc. 23:1007–15
[Google Scholar]
143.
Khattak FK, Jeblee S, Pou-Prom C, Abdalla M, Meaney C, Rudzicz F. 2019. A survey of word embeddings for clinical text. J. Biomed. Inform. X 4:100057
[Google Scholar]
144.
Velupillai S, Suominen H, Liakata M, Roberts A, Shah AD et al. 2018. Using clinical natural language processing for health outcomes research: overview and actionable suggestions for future advances. J. Biomed. Inform. 88:11–19
[Google Scholar]
145.
Off. Natl. Coord. Health Inf. Technol 2019. Office-based physician electronic health record adoption. Tech. Rep. Quick-Stat #50 Off. Natl. Coord. Health Inf. Technol., U.S. Dep. Health Hum. Serv. Washington, DC:
146.
Rajkomar A, Oren E, Chen K, Dai AM, Hajaj N et al. 2018. Scalable and accurate deep learning with electronic health records. NPJ Digit. Med. 1:18
[Google Scholar]
147.
Christodoulou E, Ma J, Collins GS, Steyerberg EW, Verbakel JY, Van Calster B. 2019. A systematic review shows no performance benefit of machine learning over logistic regression for clinical prediction models. J. Clin. Epidemiol. 110:12–22
[Google Scholar]
148.
Castro VM, Minnier J, Murphy SN, Kohane I, Churchill SE et al. 2015. Validation of electronic health record phenotyping of bipolar disorder cases and controls. Am. J. Psychiatry 172:363–72
[Google Scholar]
149.
Hoogendoorn M, Szolovits P, Moons LM, Numans ME. 2016. Utilizing uncoded consultation notes from electronic medical records for predictive modeling of colorectal cancer. Artif. Intell. Med. 69:53–61
[Google Scholar]
150.
Marcus G. 2018. Deep learning: a critical appraisal. arXiv:1801.00631 [cs.AI]

/content/journals/10.1146/annurev-biodatasci-030421-030931

Modern Clinical Text Mining: A Guide and Review

Annual Review of Biomedical Data Science 4, 165 (2021); https://doi.org/10.1146/annurev-biodatasci-030421-030931

/content/journals/10.1146/annurev-biodatasci-030421-030931

Data & Media loading...

Article Type: Review Article

Most Cited Most Cited RSS feed

- Ethical Machine Learning in Healthcare
  
  Irene Y. Chen, Emma Pierson, Sherri Rose, Shalmali Joshi, Kadija Ferryman, and Marzyeh Ghassemi
  
  Vol. 4 (2021), pp. 123–144
- Spatial Metabolomics and Imaging Mass Spectrometry in the Age of Artificial Intelligence
  
  Theodore Alexandrov
  
  Vol. 3 (2020), pp. 61–87
- Advances in Electronic Phenotyping: From Rule-Based Definitions to Machine Learning Models
  
  Juan M. Banda, Martin Seneviratne, Tina Hernandez-Boussard, and Nigam H. Shah
  
  Vol. 1 (2018), pp. 53–68
- Computational Methods for Understanding Mass Spectrometry–Based Shotgun Proteomics Data
  
  Pavel Sinitcyn, Jan Daniel Rudolph, and Jürgen Cox
  
  Vol. 1 (2018), pp. 207–234
- RNA Sequencing Data: Hitchhiker's Guide to Expression Analysis
  
  Koen Van den Berge, Katharina M. Hembach, Charlotte Soneson, Simone Tiberi, Lieven Clement, Michael I. Love, Rob Patro, and Mark D. Robinson
  
  Vol. 2 (2019), pp. 139–173
- Deep Learning in Biomedical Data Science
  
  Pierre Baldi
  
  Vol. 1 (2018), pp. 181–205
- From Tissues to Cell Types and Back: Single-Cell Gene Expression Analysis of Tissue Architecture
  
  Xi Chen, Sarah A. Teichmann, and Kerstin B. Meyer
  
  Vol. 1 (2018), pp. 29–51
- Using Phecodes for Research with the Electronic Health Record: From PheWAS to PheRS
  
  Lisa Bastarache
  
  Vol. 4 (2021), pp. 1–19
- Visualization of Biomedical Data
  
  Seán I. O'Donoghue, Benedetta Frida Baldi, Susan J. Clark, Aaron E. Darling, James M. Hogan, Sandeep Kaur, Lena Maier-Hein, Davis J. McCarthy, William J. Moore, Esther Stenau, Jason R. Swedlow, Jenny Vuong, and James B. Procter
  
  Vol. 1 (2018), pp. 275–304
- Alignment-Free Sequence Analysis and Applications
  
  Jie Ren, Xin Bai, Yang Young Lu, Kujin Tang, Ying Wang, Gesine Reinert, and Fengzhu Sun
  
  Vol. 1 (2018), pp. 93–114
More Less

Annual Review of Biomedical Data Science

Volume 4, 2021

Review Article

Free

Modern Clinical Text Mining: A Guide and Review

Abstract

Most Read This Month

Most Cited Most Cited RSS feed