- Home
- A-Z Publications
- Annual Review of Biomedical Data Science
- Early Publication
Annual Review of Biomedical Data Science - Early Publication
Reviews in Advance appear online ahead of the full published volume. View expected publication dates for upcoming volumes.
-
-
Algorithm-Based Clinical Decision Support: Evolving Regulatory Landscape and Best Practices for Local Oversight
First published online: 21 April 2025More LessThe potential of algorithm-based clinical decision support (CDS) in healthcare continues to increase with the growing field of artificial intelligence (AI)-enabled CDS. The use of these technologies to support clinicians, patients, and health systems is still quite new, and to date, implementors and regulators are still identifying the best processes and practices to ensure the effective, safe, and equitable use of these technology solutions. To assist individuals and organizations interested in implementation of algorithm-based CDS and AI-enabled CDS in healthcare, this article reviews the important regulatory decisions that form the landscape within which algorithm-based CDS has emerged, modern governance frameworks used to oversee these CDS systems, nuances in evaluation and monitoring throughout the CDS life cycle, best practices for real-world implementation, safety and equity considerations, and avenues for future collaboration and innovation.
-
-
-
Biomedical Natural Language Processing in the Era of Large Language Models
First published online: 17 April 2025More LessBiomedicine has rapidly digitized over recent decades, from genomic sequencing to electronic medical records. Now, the rise of large language models (LLMs) is driving a generative artificial intelligence (AI) revolution in natural language processing (NLP). Together, these trends create unprecedented possibilities to optimize patient care and accelerate biomedical discovery. Biomedical NLP already boosts productivity by automating labor-intensive tasks such as knowledge extraction and medical abstraction. Emerging approaches promise creativity gain, surpassing standard healthcare practices and uncovering emergent capabilities through Web-scale biomedical knowledge and population-level patient data. However, LLMs remain prone to hallucinations and omissions, and ensuring compliance and safety is vital in order to do no harm. Incorporating diverse modalities such as imaging and genomics is also essential for comprehensive solutions. We review these challenges and opportunities in biomedical NLP, offering historical context, surveying the current state of the art, and exploring frontiers for AI researchers and biomedical practitioners.
-
-
-
Network-Based Approaches for Drug Target Identification
First published online: 16 April 2025More LessDrug target identification is the first step in drug development, and its importance is underscored by the fact that, even when using genetic evidence to improve success rates, only a small fraction of lead targets end up approved for use in the clinic. One of the reasons for this is the lack of in-depth understanding of the complexity of human diseases.
In this review we argue that network-based approaches, which are able to capture relationships between relevant genes and proteins, and diverse data modalities have high potential for improving drug target identification and drug repurposing. We present the evolution of network-based methods that have been developed for this purpose and discuss the limitations of these approaches that are holding them back from making an impact in the clinic. We finish by presenting our recommendations for overcoming these limitations, for example, by leveraging emerging technologies such as artificial intelligence and knowledge graphs.
-
-
-
The TITAN-X Platform Integrates Big Data, Artificial Intelligence, Bioinformatics, and Advanced Computational Modeling to Understand Immune Responses and Develop the Next Wave of Precision Medicines
First published online: 16 April 2025More LessThe TITAN-X Precision Medicine Platform was engineered to rapidly, fully, and efficiently utilize large-scale immunology datasets, including public data, in drug discovery and development. TITAN-X integrates big data with artificial intelligence (AI), bioinformatics, and advanced computational modeling to seamlessly transition from early target discovery to clinical testing of new therapeutics, developing biomarker-driven precision medicines tailored to specific patient populations. We illustrate the capabilities of TITAN-X through four case studies, demonstrating its use in computationally driven target discovery; characterization of novel immunometabolic mechanisms in infectious, inflammatory, and autoimmune diseases; and identification of biomarker signatures for patient stratification in clinical trials designed to maximize therapeutic efficacy and safety. Data-driven and AI-powered approaches like TITAN-X are enhancing the pace of drug development, reducing costs, tailoring treatments, and increasing the probability of success in clinical trials.
-
-
-
Curriculum Design in an Evolving Field: Perspectives on Biomedical Data Science from Stanford
First published online: 09 April 2025More LessIn recent decades, there has been an explosion of data streams spanning the entire spectrum of biomedicine, opening novel opportunities to tackle biological and medical research questions, increasing our ability to provide effective and efficient health care. In parallel, augmented computational power has allowed the development and deployment of quantitative approaches at unprecedented scales. To effectively take advantage of this progress, it is important to invest in the training of a new generation of biomedical data scientists. Designing a graduate curriculum in the backdrop of a rapidly changing landscape of data, methods, and computing power demands flexibility and openness to adaptation. At the same time, we strive to ensure that the students acquire foundational competencies that might fuel productive and evolving careers, without being constrained to and defined by a niche trendy topic. We offer here a view of graduate training in biomedical data science from the standpoint of our experience at Stanford University. We conclude with a series of open challenges, the answers to which we believe will shape training in biomedical data science.
Updated on April 29, 2025
-
-
-
Generative Artificial Intelligence: Implications for Biomedical and Health Professions Education
First published online: 09 April 2025More LessGenerative artificial intelligence (AI) has had a profound impact on biomedicine and health, both in professional work and in education. Based on large language models (LLMs), generative AI has been found to perform as well as humans in simulated situations taking medical board exams, answering clinical questions, solving clinical cases, applying clinical reasoning, and summarizing information. Generative AI is also being used widely in education, performing well in academic courses and their assessments. This review summarizes the successes of LLMs and highlights some of their challenges in the context of education, most notably aspects that may undermines the acquisition of knowledge and skills for professional work. It then provides recommendations for best practices to overcome the shortcomings of LLM use in education. Although there are challenges for the use of generative AI in education, all students and faculty, in biomedicine and health and beyond, must have understanding and be competent in its use.
-
-
-
From Prediction to Prescription: Machine Learning and Causal Inference for the Heterogeneous Treatment Effect
First published online: 09 April 2025More LessThe increasing accumulation of medical data brings the hope of data-driven medical decision-making, but data's increasing complexity—as text or images in electronic health records—calls for complex models, such as machine learning. Here, we review how machine learning can be used to inform decisions for individualized interventions, a causal question. Going from prediction to causal effects is challenging, as no individual is seen as both treated and not. We detail how some data can support some causal claims and how to build causal estimators with machine learning. Beyond variable selection to adjust for confounding bias, we cover the broader notions of study design that make or break causal inference. As the problems span across diverse scientific communities, we use didactic yet statistically precise formulations to bridge machine learning to epidemiology.
-
-
-
Strategies for Creating Robust Patient Groups to Study Diverse Conditions with Electronic Health Records
First published online: 08 April 2025More LessThe growth of electronic health record (EHR) databases in size and availability has created an unprecedented opportunity to better understand human health and disease. However, conducting robust EHR studies requires careful filtering criteria and study design, as EHRs pose several challenges that can confound analyses and lead to inaccurate results. Here we review these challenges and make suggestions about how to avoid or adjust for major confounders and biases in common EHR study designs. We further highlight qualities of EHR data that make different diseases more or less feasible for study. These recommendations for conducting research using EHRs will help inform database selection, improve reproducibility of results across the field, and enhance the validity of study results.
-
-
-
Beyond Multiple-Choice Accuracy: Real-World Challenges of Implementing Large Language Models in Healthcare
First published online: 08 April 2025More LessLarge language models (LLMs) have gained significant attention in the medical domain for their human-level capabilities, leading to increased efforts to explore their potential in various healthcare applications. However, despite such a promising future, there are multiple challenges and obstacles that remain for their real-world uses in practical settings. This work discusses key challenges for LLMs in medical applications from four unique aspects: operational vulnerabilities, ethical and social considerations, performance and assessment difficulties, and legal and regulatory compliance. Addressing these challenges is crucial for leveraging LLMs to their full potential and ensuring their responsible integration into healthcare.
-
-
-
Revisiting Technical Bias Mitigation Strategies
First published online: 08 April 2025More LessEfforts to mitigate bias and enhance fairness in the artificial intelligence (AI) community have predominantly focused on technical solutions. While numerous reviews have addressed bias in AI, this review uniquely focuses on the practical limitations of technical solutions in healthcare settings, providing a structured analysis across five key dimensions affecting their real-world implementation: who defines bias and fairness, which mitigation strategy to use and prioritize among dozens that are inconsistent and incompatible, when in the AI development stages the solutions are most effective, for which populations, and the context for which the solutions are designed. We illustrate each limitation with empirical studies focusing on healthcare and biomedical applications. Moreover, we discuss how value-sensitive AI, a framework derived from technology design, can engage stakeholders and ensure that their values are embodied in bias and fairness mitigation solutions. Finally, we discuss areas that require further investigation and provide practical recommendations to address the limitations covered in the study.
-
-
-
The Development Landscape of Large Language Models for Biomedical Applications
First published online: 01 April 2025More LessLarge language models (LLMs) have become powerful tools for biomedical applications, offering potential to transform healthcare and medical research. Since the release of ChatGPT in 2022, there has been a surge in LLMs for diverse biomedical applications. This review examines the landscape of text-based biomedical LLM development, analyzing model characteristics (e.g., architecture), development processes (e.g., training strategy), and applications (e.g., chatbots). Following PRISMA guidelines, 82 articles were selected out of 5,512 articles since 2022 that met our rigorous criteria, including the requirement of using biomedical data when training LLMs. Findings highlight the predominant use of decoder-only architectures such as Llama 7B, prevalence of task-specific fine-tuning, and reliance on biomedical literature for training. Challenges persist in balancing data openness with privacy concerns and detailing model development, including computational resources used.
Future efforts would benefit from multimodal integration, LLMs for specialized medical applications, and improved data sharing and model accessibility.
-
-
-
Integrative Data Science in Drug Safety Research: Experiences, Challenges, and Perspectives
First published online: 01 April 2025More LessPharmaceutical research and development largely depend on the quantity and quality of data that are available to support projects. The secondary use of data by means of collaborative and integrative approaches is yielding promising results in drug safety research. However, there are challenges that must be overcome in these integrative approaches, such as interoperability issues, intellectual property protection, and, in the case of clinical information, personal data safeguards. The OMOP common data model and the EHDEN and DARWIN EU platforms constitute successful examples of data sharing initiatives in the clinical domain, while the eTOX, eTRANSAFE, and VICT3R international projects are examples of corporate data sharing in toxicology research. The VICT3R project is using these shared data for generating virtual control groups to be applied in nonclinical drug safety assessment. Drug-related knowledge bases that integrate information from different sources also constitute useful tools in the drug safety domain.
-
-
-
Clinical Text Generation: Are We There Yet?
First published online: 18 March 2025More LessGenerative artificial intelligence (AI), operationalized as large language models, is increasingly used in the biomedical field to assist with a range of text processing tasks including text classification, information extraction, and decision support. In this article, we focus on the primary purpose of generative language models, namely the production of unstructured text. We review past and current methods used to generate text as well as methods for evaluating open text generation, i.e., in contexts where no reference text is available for comparison. We discuss clinical applications that can benefit from high quality, ethically designed text generation, such as clinical note generation and synthetic text generation in support of secondary use of health data. We also raise awareness of the risks involved with generative AI such as overconfidence in outputs due to anthropomorphism and the risk of representational and allocation harms due to biases.
-
-
-
Generative Artificial Intelligence in Medicine
First published online: 18 March 2025More LessThe increased capabilities of generative artificial intelligence (AI) have dramatically expanded its possible use cases in medicine. We provide a comprehensive overview of generative AI use cases for clinicians, patients, clinical trial organizers, researchers, and trainees. We then discuss the many challenges—including maintaining privacy and security, improving transparency and interpretability, upholding equity, and rigorously evaluating models—that must be overcome to realize this potential, as well as the open research directions they give rise to.
-
-
-
Genetic Studies Through the Lens of Gene Networks
First published online: 20 February 2025More LessUnderstanding the genetic basis of complex traits is a longstanding challenge in the field of genomics. Genome-wide association studies have identified thousands of variant–trait associations, but most of these variants are located in noncoding regions, making the link to biological function elusive. While traditional approaches, such as transcriptome-wide association studies (TWAS), have advanced our understanding by linking genetic variants to gene expression, they often overlook gene–gene interactions. Here, we review current approaches to integrate different molecular data, leveraging machine learning methods to identify gene modules based on coexpression and functional relationships. These integrative approaches, such as PhenoPLIER, combine TWAS and drug-induced transcriptional profiles to effectively capture biologically meaningful gene networks. This integration provides a context-specific understanding of disease processes while highlighting both core and peripheral genes. These insights pave the way for novel therapeutic targets and enhance the interpretability of genetic studies in personalized medicine.
-
-
-
Evaluation and Regulation of Artificial Intelligence Medical Devices for Clinical Decision Support
First published online: 19 February 2025More LessArtificial intelligence (AI) methods were first developed nearly seven decades ago. Only in recent years have they demonstrated their potential to improve clinical care at the bedside. AI systems are now capable of interpreting, predicting, and even generating important medical information. AI medical devices share many similarities with traditional medical devices but also diverge from them in important ways. Despite widespread optimism and enthusiasm surrounding the use of such devices to improve care processes, patient outcomes, and the healthcare experience for patients, caregivers, and clinicians alike, little evidence exists so far for their effectiveness in practice. Even less is known about the safety or equity of AI medical devices. As with any new technology, this exciting time is accompanied by appropriate questions regarding if, how much, when, and who such AI systems really help. Different stakeholders, ranging from patients to clinicians to industry device developers, may have divergent preferences or assessments of risk and benefits, warranting an informed public discussion to guide emerging regulatory efforts. This review summarizes the rapidly evolving recent efforts and evidence related to the regulation and evaluation of AI medical devices and highlights opportunities for future work to ensure their effectiveness, safety, and equity.
-
-
-
Foundation Models for Translational Cancer Biology
First published online: 29 January 2025More LessCancer remains a leading cause of death globally. The complexity and diversity of cancer-related datasets across different specialties pose challenges in refining precision medicine for oncology. Foundation models offer a promising solution. Trained on vast amounts of data, these models develop a broad understanding across a wide range of tasks. We examine the role of foundation models in domains relevant to cancer research, including natural language processing, computer vision, molecular biology, and cheminformatics. Through a review of state-of-the-art methods, we explore how these models have already advanced translational cancer research goals such as precision tumor classification and artificial intelligence–assisted surgery. We also discuss prospective advances in areas like early tumor detection, personalized cancer treatment, and drug discovery. This review provides researchers with a curated set of resources and methodologies, offers practitioners a deeper understanding of how these models enhance cancer care, and points to opportunities for future applications of foundation models in cancer research.
-
-
-
Conditional Generative Models for Synthetic Tabular Data: Applications for Precision Medicine and Diverse Representations
Kara Liu, and Russ B. AltmanFirst published online: 14 January 2025More LessTabular medical datasets, like electronic health records (EHRs), biobanks, and structured clinical trial data, are rich sources of information with the potential to advance precision medicine and optimize patient care. However, real-world medical datasets have limited patient diversity and cannot simulate hypothetical outcomes, both of which are necessary for equitable and effective medical research. Fueled by recent advancements in machine learning, generative models offer a promising solution to these data limitations by generating enhanced synthetic data. This review highlights the potential of conditional generative models (CGMs) to create patient-specific synthetic data for a variety of precision medicine applications. We survey CGM approaches that tackle two medical applications: correcting for data representation biases and simulating digital health twins. We additionally explore how the surveyed methods handle modeling tabular medical data and briefly discuss evaluation criteria. Finally, we summarize the technical, medical, and ethical challenges that must be addressed before CGMs can be effectively and safely deployed in the medical field.
-
-
-
Spatial Transcriptomics Brings New Challenges and Opportunities for Trajectory Inference
First published online: 14 November 2024More LessSpatial transcriptomics (ST) brings new dimensions to the analysis of single-cell data. While some methods for data analysis can be ported over without major modifications, they are the exception rather than the rule. Trajectory inference (TI) methods in particular can suffer from significant challenges due to spatial batch effects in ST data. These can add independent sources of noise to each time point. Pioneering methods for TI on ST data have focused primarily on addressing the batch effects in physical arrangement, i.e., where tissues are deformed in different ways at different time points. However, other challenges arise due to the measurement granularity of ST technologies, as well as a bias from slicing. In this review, we examine the sources of these challenges, and we explore how they are addressed with current state-of-the-art STTI methods. We conclude by highlighting some opportunities for future method development.
-