Volume 1

Annual Review of Biomedical Data Science - Volume 1, 2018

Volume 1, 2018

- What is Biomedical Data Science and Do We Need an Annual Review of It?
  
  Russ B. Altman, and Michael Levitt
  
  Vol. 1 (2018), pp. i–iii
  
  https://doi.org/10.1146/annurev-bd-01-041718-100001
  More Less
  
  Add to my favoritesFavourites
  
  Email this

- Science as a Culinary Art: How Data Science and Informatics Will Change Knowledge Discovery for Everyone
  
  Nicholas P. Tatonetti
  
  Vol. 1 (2018), pp. iv–vi
  
  https://doi.org/10.1146/annurev-bd-01-041718-100011
  More Less
  
  Add to my favoritesFavourites
  
  Email this

- Big Data Approaches for Modeling Response and Resistance to Cancer Drugs
  
  Peng Jiang, William R. Sellers, and X. Shirley Liu
  
  Vol. 1 (2018), pp. 1–27
  
  https://doi.org/10.1146/annurev-biodatasci-080917-013350
  More Less
  
  Despite significant progress in cancer research, current standard-of-care drugs fail to cure many types of cancers. Hence, there is an urgent need to identify better predictive biomarkers and treatment regimes. Conventionally, insights from hypothesis-driven studies are the primary force for cancer biology and therapeutic discoveries. Recently, the rapid growth of big data resources, catalyzed by breakthroughs in high-throughput technologies, has resulted in a paradigm shift in cancer therapeutic research. The combination of computational methods and genomics data has led to several successful clinical applications. In this review, we focus on recent advances in data-driven methods to model anticancer drug efficacy, and we present the challenges and opportunities for data science in cancer therapeutic research.
  
  Add to my favoritesFavourites
  
  Email this

- From Tissues to Cell Types and Back: Single-Cell Gene Expression Analysis of Tissue Architecture
  
  Xi Chen, Sarah A. Teichmann, and Kerstin B. Meyer
  
  Vol. 1 (2018), pp. 29–51
  
  https://doi.org/10.1146/annurev-biodatasci-080917-013452
  More Less
  
  With the recent transformative developments in single-cell genomics and, in particular, single-cell gene expression analysis, it is now possible to study tissues at the single-cell level, rather than having to rely on data from bulk measurements. Here we review the rapid developments in single-cell RNA sequencing (scRNA-seq) protocols that have the potential for unbiased identification and profiling of all cell types within a tissue or organism. In addition, novel approaches for spatial profiling of gene expression allow us to map individual cells and cell types back into the three-dimensional context of organs. The combination of in-depth single-cell and spatial gene expression data will reveal tissue architecture in unprecedented detail, generating a wealth of biological knowledge and a better understanding of many diseases.
  
  Add to my favoritesFavourites
  
  Email this

- Advances in Electronic Phenotyping: From Rule-Based Definitions to Machine Learning Models
  
  Juan M. Banda, Martin Seneviratne, Tina Hernandez-Boussard, and Nigam H. Shah
  
  Vol. 1 (2018), pp. 53–68
  
  https://doi.org/10.1146/annurev-biodatasci-080917-013315
  More Less
  
  With the widespread adoption of electronic health records (EHRs), large repositories of structured and unstructured patient data are becoming available to conduct observational studies. Finding patients with specific conditions or outcomes, known as phenotyping, is one of the most fundamental research problems encountered when using these new EHR data. Phenotyping forms the basis of translational research, comparative effectiveness studies, clinical decision support, and population health analyses using routinely collected EHR data. We review the evolution of electronic phenotyping, from the early rule-based methods to the cutting edge of supervised and unsupervised machine learning models. We aim to cover the most influential papers in commensurate detail, with a focus on both methodology and implementation. Finally, future research directions are explored.
  
  Add to my favoritesFavourites
  
  Email this

- Defining Phenotypes from Clinical Data to Drive Genomic Research
  
  Jamie R. Robinson, Wei-Qi Wei, Dan M. Roden, and Joshua C. Denny
  
  Vol. 1 (2018), pp. 69–92
  
  https://doi.org/10.1146/annurev-biodatasci-080917-013335
  More Less
  
  The rise in available longitudinal patient information in electronic health records (EHRs) and their coupling to DNA biobanks have resulted in a dramatic increase in genomic research using EHR data for phenotypic information. EHRs have the benefit of providing a deep and broad data source of health-related phenotypes, including drug response traits, expanding the phenomes available to researchers for discovery. The earliest efforts at repurposing EHR data for research involved manual chart review of limited numbers of patients but now typically involve applications of rule-based and machine learning algorithms operating on sometimes huge corpora for both genome-wide and phenome-wide approaches. In this review, we highlight the current methods, impact, challenges, and opportunities for repurposing clinical data to define patient phenotypes for genomic discovery. Use of EHR data has proven a powerful method for elucidating genomic influences on diseases, traits, and drug-response phenotypes and will continue to have increasing applications in large cohort studies.
  
  Add to my favoritesFavourites
  
  Email this

- Alignment-Free Sequence Analysis and Applications
  
  Jie Ren, Xin Bai, Yang Young Lu, Kujin Tang, Ying Wang, Gesine Reinert, and Fengzhu Sun
  
  Vol. 1 (2018), pp. 93–114
  
  https://doi.org/10.1146/annurev-biodatasci-080917-013431
  More Less
  
  Genome and metagenome comparisons based on large amounts of next-generation sequencing (NGS) data pose significant challenges for alignment-based approaches due to the huge data size and the relatively short length of the reads. Alignment-free approaches based on the counts of word patterns in NGS data do not depend on the complete genome and are generally computationally efficient. Thus, they contribute significantly to genome and metagenome comparison. Recently, novel statistical approaches have been developed for the comparison of both long and shotgun sequences. These approaches have been applied to many problems, including the comparison of gene regulatory regions, genome sequences, metagenomes, binning contigs in metagenomic data, identification of virus–host interactions, and detection of horizontal gene transfers. We provide an updated review of these applications and other related developments of word count–based approaches for alignment-free sequence analysis.
  
  Add to my favoritesFavourites
  
  Email this

- Privacy Policy and Technology in Biomedical Data Science
  
  April Moreno Arellano, Wenrui Dai, Shuang Wang, Xiaoqian Jiang, and Lucila Ohno-Machado
  
  Vol. 1 (2018), pp. 115–129
  
  https://doi.org/10.1146/annurev-biodatasci-080917-013416
  More Less
  
  Privacy is an important consideration when sharing clinical data, which often contain sensitive information. Adequate protection to safeguard patient privacy and to increase public trust in biomedical research is paramount. This review covers topics in policy and technology in the context of clinical data sharing. We review policy articles related to (a) the Common Rule, HIPAA privacy and security rules, and governance; (b) patients’ viewpoints and consent practices; and (c) research ethics. We identify key features of the revised Common Rule and the most notable changes since its previous version. We address data governance for research in addition to the increasing emphasis on ethical and social implications. Research ethics topics include data sharing best practices, use of data from populations of low socioeconomic status (SES), recent updates to institutional review board (IRB) processes to protect human subjects’ data, and important concerns about the limitations of current policies to address data deidentification. In terms of technology, we focus on articles that have applicability in real world health care applications: deidentification methods that comply with HIPAA, data anonymization approaches to satisfy well-acknowledged issues in deidentified data, encryption methods to safeguard data analyses, and privacy-preserving predictive modeling. The first two technology topics are mostly relevant to methodologies that attempt to sanitize structured or unstructured data. The third topic includes analysis on encrypted data. The last topic includes various mechanisms to build statistical models without sharing raw data.
  
  Add to my favoritesFavourites
  
  Email this

- Opportunities and Challenges of Whole-Cell and -Tissue Simulations of the Outer Retina in Health and Disease
  
  Philip J. Luthert, Luis Serrano, and Christina Kiel
  
  Vol. 1 (2018), pp. 131–152
  
  https://doi.org/10.1146/annurev-biodatasci-080917-013356
  More Less
  
  Visual processing starts in the outer retina, where photoreceptor cells sense photons that trigger electrical responses. Retinal pigment epithelial cells are located external to the photoreceptor layer and have critical functions in supporting cell and tissue homeostasis and thus sustaining a healthy retina. The high level of specialization makes the retina vulnerable to alterations that promote retinal degeneration. In this review, we discuss opportunities and challenges in proposing whole-cell and -tissue simulations of the human outer retina. An implicit position taken throughout this review is that mapping diverse data sets onto integrative computational models is likely to be a pivotal approach to understanding complex disease and developing novel interventions.
  
  Add to my favoritesFavourites
  
  Email this

- Network Analysis as a Grand Unifier in Biomedical Data Science
  
  Patrick McGillivray, Declan Clarke, William Meyerson, Jing Zhang, Donghoon Lee, Mengting Gu, Sushant Kumar, Holly Zhou, and Mark Gerstein
  
  Vol. 1 (2018), pp. 153–180
  
  https://doi.org/10.1146/annurev-biodatasci-080917-013444
  More Less
  
  Biomedical data scientists study many types of networks, ranging from those formed by neurons to those created by molecular interactions. People often criticize these networks as uninterpretable diagrams termed hairballs; however, here we show that molecular biological networks can be interpreted in several straightforward ways. First, we can break down a network into smaller components, focusing on individual pathways and modules. Second, we can compute global statistics describing the network as a whole. Third, we can compare networks. These comparisons can be within the same context (e.g., between two gene regulatory networks) or cross-disciplinary (e.g., between regulatory networks and governmental hierarchies). The latter comparisons can transfer a formalism, such as that for Markov chains, from one context to another or relate our intuitions in a familiar setting (e.g., social networks) to the relatively unfamiliar molecular context. Finally, key aspects of molecular networks are dynamics and evolution, i.e., how they evolve over time and how genetic variants affect them. By studying the relationships between variants in networks, we can begin to interpret many common diseases, such as cancer and heart disease.
  
  Add to my favoritesFavourites
  
  Email this

- Deep Learning in Biomedical Data Science
  
  Pierre Baldi
  
  Vol. 1 (2018), pp. 181–205
  
  https://doi.org/10.1146/annurev-biodatasci-080917-013343
  More Less
  
  Since the 1980s, deep learning and biomedical data have been coevolving and feeding each other. The breadth, complexity, and rapidly expanding size of biomedical data have stimulated the development of novel deep learning methods, and application of these methods to biomedical data have led to scientific discoveries and practical solutions. This overview provides technical and historical pointers to the field, and surveys current applications of deep learning to biomedical data organized around five subareas, roughly of increasing spatial scale: chemoinformatics, proteomics, genomics and transcriptomics, biomedical imaging, and health care. The black box problem of deep learning methods is also briefly discussed.
  
  Add to my favoritesFavourites
  
  Email this

- Computational Methods for Understanding Mass Spectrometry–Based Shotgun Proteomics Data
  
  Pavel Sinitcyn, Jan Daniel Rudolph, and Jürgen Cox
  
  Vol. 1 (2018), pp. 207–234
  
  https://doi.org/10.1146/annurev-biodatasci-080917-013516
  More Less
  
  Computational proteomics is the data science concerned with the identification and quantification of proteins from high-throughput data and the biological interpretation of their concentration changes, posttranslational modifications, interactions, and subcellular localizations. Today, these data most often originate from mass spectrometry–based shotgun proteomics experiments. In this review, we survey computational methods for the analysis of such proteomics data, focusing on the explanation of the key concepts. Starting with mass spectrometric feature detection, we then cover methods for the identification of peptides. Subsequently, protein inference and the control of false discovery rates are highly important topics covered. We then discuss methods for the quantification of peptides and proteins. A section on downstream data analysis covers exploratory statistics, network analysis, machine learning, and multiomics data integration. Finally, we discuss current developments and provide an outlook on what the near future of computational proteomics might bear.
  
  Add to my favoritesFavourites
  
  Email this

- Data Science Issues in Studying Protein–RNA Interactions with CLIP Technologies
  
  Anob M. Chakrabarti, Nejc Haberman, Arne Praznik, Nicholas M. Luscombe, and Jernej Ule
  
  Vol. 1 (2018), pp. 235–261
  
  https://doi.org/10.1146/annurev-biodatasci-080917-013525
  More Less
  
  An interplay of experimental and computational methods is required to achieve a comprehensive understanding of protein–RNA interactions. UV crosslinking and immunoprecipitation (CLIP) identifies endogenous interactions by sequencing RNA fragments that copurify with a selected RNA-binding protein under stringent conditions. Here we focus on approaches for the analysis of the resulting data and appraise the methods for peak calling, visualization, analysis, and computational modeling of protein–RNA binding sites. We advocate that the sensitivity and specificity of data be assessed in combination for computational quality control. Moreover, we demonstrate the value of analyzing sequence motif enrichment in peaks assigned from CLIP data and of visualizing RNA maps, which examine the positional distribution of peaks around regulated landmarks in transcripts. We use these to assess how variations in CLIP data quality and in different peak calling methods affect the insights into regulatory mechanisms. We conclude by discussing future opportunities for the computational analysis of protein–RNA interaction experiments.
  
  Add to my favoritesFavourites
  
  Email this

- Large-Scale Analysis of Genetic and Clinical Patient Data
  
  Marylyn D. Ritchie
  
  Vol. 1 (2018), pp. 263–274
  
  https://doi.org/10.1146/annurev-biodatasci-080917-013508
  More Less
  
  Biomedical data science has experienced an explosion of new data over the past decade. Abundant genetic and genomic data are increasingly available in large, diverse data sets due to the maturation of modern molecular technologies. Along with these molecular data, dense, rich phenotypic data are also available on comprehensive clinical data sets from health care provider organizations, clinical trials, population health registries, and epidemiologic studies. The methods and approaches for interrogating these large genetic/genomic and clinical data sets continue to evolve rapidly, as our understanding of the questions and challenges continue to emerge. In this review, the state-of-the-art methodologies for genetic/genomic analysis along with complex phenomics will be discussed. This field is changing and adapting to the novel data types made available, as well as technological advances in computation and machine learning. Thus, I will also discuss the future challenges in this exciting and innovative space. The promises of precision medicine rely heavily on the ability to marry complex genetic/genomic data with clinical phenotypes in meaningful ways.
  
  Add to my favoritesFavourites
  
  Email this

- Visualization of Biomedical Data
  
  Seán I. O'Donoghue, Benedetta Frida Baldi, Susan J. Clark, Aaron E. Darling, James M. Hogan, Sandeep Kaur, Lena Maier-Hein, Davis J. McCarthy, William J. Moore, Esther Stenau, Jason R. Swedlow, Jenny Vuong, and James B. Procter
  
  Vol. 1 (2018), pp. 275–304
  
  https://doi.org/10.1146/annurev-biodatasci-080917-013424
  More Less
  
  The rapid increase in volume and complexity of biomedical data requires changes in research, communication, and clinical practices. This includes learning how to effectively integrate automated analysis with high–data density visualizations that clearly express complex phenomena. In this review, we summarize key principles and resources from data visualization research that help address this difficult challenge. We then survey how visualization is being used in a selection of emerging biomedical research areas, including three-dimensional genomics, single-cell RNA sequencing (RNA-seq), the protein structure universe, phosphoproteomics, augmented reality–assisted surgery, and metagenomics. While specific research areas need highly tailored visualizations, there are common challenges that can be addressed with general methods and strategies. Also common, however, are poor visualization practices. We outline ongoing initiatives aimed at improving visualization practices in biomedical research via better tools, peer-to-peer learning, and interdisciplinary collaboration with computer scientists, science communicators, and graphic designers. These changes are revolutionizing how we see and think about our data.
  
  Add to my favoritesFavourites
  
  Email this

- A Census of Disease Ontologies
  
  Melissa A. Haendel, Julie A. McMurry, Rose Relevo, Christopher J. Mungall, Peter N. Robinson, and Christopher G. Chute
  
  Vol. 1 (2018), pp. 305–331
  
  https://doi.org/10.1146/annurev-biodatasci-080917-013459
  More Less
  
  For centuries, humans have sought to classify diseases based on phenotypic presentation and available treatments. Today, a wide landscape of strategies, resources, and tools exist to classify patients and diseases. Ontologies can provide a robust foundation of logic for precise stratification and classification along diverse axes such as etiology, development, treatment, and genetics. Disease and phenotype ontologies are used in four primary ways: (a) search, retrieval, and annotation of knowledge; (b) data integration and analysis; (c) clinical decision support; and (d) knowledge discovery. Computational inference can connect existing knowledge and generate new insights and hypotheses about drug targets, prognosis prediction, or diagnosis. In this review, we examine the rise of disease and phenotype ontologies and the diverse ways they are represented and applied in biomedicine.
  
  Add to my favoritesFavourites
  
  Email this

Annual Review of Biomedical Data Science - Volume 1, 2018

Volume 1, 2018

What is Biomedical Data Science and Do We Need an Annual Review of It?

Science as a Culinary Art: How Data Science and Informatics Will Change Knowledge Discovery for Everyone

Big Data Approaches for Modeling Response and Resistance to Cancer Drugs

From Tissues to Cell Types and Back: Single-Cell Gene Expression Analysis of Tissue Architecture

Advances in Electronic Phenotyping: From Rule-Based Definitions to Machine Learning Models

Defining Phenotypes from Clinical Data to Drive Genomic Research

Alignment-Free Sequence Analysis and Applications

Privacy Policy and Technology in Biomedical Data Science

Opportunities and Challenges of Whole-Cell and -Tissue Simulations of the Outer Retina in Health and Disease

Network Analysis as a Grand Unifier in Biomedical Data Science

Deep Learning in Biomedical Data Science

Computational Methods for Understanding Mass Spectrometry–Based Shotgun Proteomics Data

Data Science Issues in Studying Protein–RNA Interactions with CLIP Technologies

Large-Scale Analysis of Genetic and Clinical Patient Data

Visualization of Biomedical Data

A Census of Disease Ontologies

Previous Volumes

Volume 6 (2023)

Volume 5 (2022)

Volume 4 (2021)

Volume 3 (2020)

Volume 2 (2019)

Volume 1 (2018)

Volume 0 (1932)