Annual Reviews home
0
Skip to content
  • For Librarians & Agents
  • For Authors
  • Knowable Magazine
  • Institutional Login
  • Login
  • Register
  • Activate
  • 0 Cart
  • Help
Annual Reviews home
  • JOURNALS A-Z
    • Analytical Chemistry
    • Animal Biosciences
    • Anthropology
    • Astronomy and Astrophysics
    • Biochemistry
    • Biomedical Data Science
    • Biomedical Engineering
    • Biophysics
    • Cancer Biology
    • Cell and Developmental Biology
    • Chemical and Biomolecular Engineering
    • Clinical Psychology
    • Computer Science
    • Condensed Matter Physics
    • Control, Robotics, and Autonomous Systems
    • Criminology
    • Developmental Psychology
    • Earth and Planetary Sciences
    • Ecology, Evolution, and Systematics
    • Economics
    • Entomology
    • Environment and Resources
    • Financial Economics
    • Fluid Mechanics
    • Food Science and Technology
    • Genetics
    • Genomics and Human Genetics
    • Immunology
    • Law and Social Science
    • Linguistics
    • Marine Science
    • Materials Research
    • Medicine
    • Microbiology
    • Neuroscience
    • Nuclear and Particle Science
    • Nutrition
    • Organizational Psychology and Organizational Behavior
    • Pathology: Mechanisms of Disease
    • Pharmacology and Toxicology
    • Physical Chemistry
    • Physiology
    • Phytopathology
    • Plant Biology
    • Political Science
    • Psychology
    • Public Health
    • Resource Economics
    • Sociology
    • Statistics and Its Application
    • Virology
    • Vision Science
    • Article Collections
    • Events
    • Shot of Science
  • JOURNAL INFO
    • Copyright & Permissions
    • Add To Your Course Reader
    • Expected Publication Dates
    • Impact Factor Rankings
    • Access Metadata
    • RSS Feeds
  • PRICING & SUBSCRIPTIONS
    • General Ordering Info
    • Online Activation Instructions
    • Personal Pricing
    • Institutional Pricing
    • Society Partnerships
  •     S2O    
  •     GIVE    
  • ABOUT
    • What We Do
    • Founder & History
    • Our Team
    • Careers
    • Press Center
    • Events
    • News
    • Global Access
    • DEI
    • Directory
    • Help/FAQs
    • Contact Us
  • Home >
  • Annual Review of Public Health >
  • Volume 41, 2020 >
  • Wiemken, pp 21-36
  • Save
  • Email
  • Share

Machine Learning in Epidemiology and Health Outcomes Research

  • Home
  • Annual Review of Public Health
  • Volume 41, 2020
  • Wiemken, pp 21-36
  • Facebook
  • Twitter
  • LinkedIn
Download PDF

Machine Learning in Epidemiology and Health Outcomes Research

Annual Review of Public Health

Vol. 41:21-36 (Volume publication date April 2020)
First published as a Review in Advance on October 2, 2019
https://doi.org/10.1146/annurev-publhealth-040119-094437

Timothy L. Wiemken1 and Robert R. Kelley2

1Center for Health Outcomes Research, Saint Louis University, Saint Louis, Missouri 63104, USA; email: [email protected]

2Department of Computer Science, Bellarmine University, Louisville, Kentucky 40205, USA; email: [email protected]

Download PDF Article Metrics
  • Reprints

Copyright © 2020 by Annual Reviews. This work is licensed under a Creative Commons Attribution 4.0 International License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. See credit lines of images or other third party material in this article for license information.
  • Download Citation
  • Citation Alerts
Sections
  • Abstract
  • Keywords
  • INTRODUCTION TO EPIDEMIOLOGY
  • HISTORICAL RATIONALE FOR STATISTICAL MODELING
  • INTRODUCTION TO MACHINE LEARNING
  • THE MACHINE LEARNING APPROACH
  • FUTURE DIRECTIONS
  • PUTTING IT ALL TOGETHER AND LEARNING MORE
  • CONCLUSIONS
  • SUMMARY POINTS
  • disclosure statement
  • literature cited
image

Abstract

Machine learning approaches to modeling of epidemiologic data are becoming increasingly more prevalent in the literature. These methods have the potential to improve our understanding of health and opportunities for intervention, far beyond our past capabilities. This article provides a walkthrough for creating supervised machine learning models with current examples from the literature. From identifying an appropriate sample and selecting features through training, testing, and assessing performance, the end-to-end approach to machine learning can be a daunting task. We take the reader through each step in the process and discuss novel concepts in the area of machine learning, including identifying treatment effects and explaining the output from machine learning models.

Keywords

predictive modeling, artificial intelligence, deep learning, treatment effects, walkthrough, biostatistics

INTRODUCTION TO EPIDEMIOLOGY

Epidemiology is defined as the study of the distribution and determinants of disease (25). Improvements in population health and increased survival rates in humans can be traced to interventions developed from evidence obtained through epidemiologic study (41). Data analytics are one of the most critical underlying aspects of epidemiology; increasing computational power over the past decade has vastly expanded our modeling capabilities and approaches (23). Due to the variety of areas of study in epidemiology and the unique needs of each, novel computational modeling strategies are highly prevalent in the scientific literature.

HISTORICAL RATIONALE FOR STATISTICAL MODELING

Frequentist statistical methodologies are the most commonly used approaches to analytics in epidemiologic studies to date (27). These methods are often confusing to nonstatisticians and are steeped in the development of hypotheses and the calculation of probabilities that offer support for or against rejection of these hypotheses. Basic statistical tests and multivariable regression modeling are commonly used for testing hypotheses to define associations or treatment effects between predictor and outcome variables under study. These traditional statistical approaches are used in what is coined as the “data culture” (12).

Traditional regression-type modeling of health outcomes in epidemiology can be categorized by the purpose of the model, whether it is necessary to predict a dependent variable given multiple independent variables (e.g., predictive models) or to produce a measure of treatment effect or magnitude and statistical association of individual independent variables on the dependent variable (e.g., explanatory models) (57). Each modeling strategy provides useful information for investigators and practitioners. Most traditional modeling approaches are data focused and make various assumptions about the data used within the model (12). Assumptions such as linearity, lack of multicollinearity, and proportional risk/odds/hazards over time are well understood by epidemiologists. As more data become available for analytics, Richard Bellman's “curse of dimensionality” becomes apparent (7). In this state, research questions become more advanced, traditional modeling assumptions become more difficult to meet, relationships are highly nonlinear, and new methods must be utilized. Novel approaches in machine learning have become a focus in medicine, with more limited use in population health (22) over the past several years. The purpose of this review is to document the uses, strategies, and approaches, as well as the advantages and disadvantages of machine learning models in the field of epidemiology and health outcomes research, with a main focus on supervised machine learning methods.

INTRODUCTION TO MACHINE LEARNING

“Machine learning” is an umbrella term used to describe a wide variety of models and strategies that focus on algorithmic modeling (45). In contrast, the term regression in epidemiology typically refers to a wide variety of frequentist regression models such as logistic, linear, and Cox proportional hazards often used in epidemiology and biostatistics (28).

The concept of machine learning has existed from the early 1950s to address the possibility of having computers approximate the human thought process through pattern matching, recognition, and decision making (64). This work continued through the research of Arthur Samuel, who wrote a program to learn to play the board game checkers (53), and that of Frank Rosenblatt (50), who designed the first artificial neural network, which used the principles of neural biology to perform computation. Since that time, numerous machine learning algorithms have been developed to solve many learning problems. These algorithms are generally grouped into supervised or unsupervised models. Supervised models are typically used to predict an outcome (known as a label in machine learning), similar to predictive modeling using regression. Unsupervised models are typically used to discover unknown patterns in data, without respect to a particular label. In this review, we focus primarily on supervised models in epidemiology. Although these techniques have been around for many years, machine learning was not accepted as a Medline Medical Subject Heading (MeSH) term until 2016 (https://www.ncbi.nlm.nih.gov/mesh/2010029).

Although the term machine learning is often used in today's environment, “statistical learning” is also commonly used in the literature. This variation in terminology is due to several novel strategies that combine traditional frequentist biostatistical approaches, such as hypothesis testing, with algorithmic approaches typical of machine learning models (7). This practice further blurs the lines between traditional biostatistics and machine learning, resulting in the combined phrasing: statistical learning. Regardless, the machine learning literature utilizes different terms for similar concepts used in epidemiology. For the purposes of this review, we use machine learning terminology. For clarity, Table 1 displays common terms used in epidemiology with their corollaries in machine learning. Most notably, the term features refers to what epidemiologists would consider independent variables, whereas the term label refers to the dependent variable.

image
CLICK TO VIEW
Table 1

Linking terms and phrases in epidemiology and machine learning

THE MACHINE LEARNING APPROACH

Setting up a machine learning model such that the predictions are valid and accurate can be a daunting task, not substantially different from developing a biostatistical regression model. Below, we describe the end-to-end process of machine learning (Figure 1), with examples in epidemiology and public health.

figure
Figure 1 

Sample Size

The literature includes numerous approaches for identifying appropriate sample sizes for machine learning models (6, 21, 47). However, sample size estimates are difficult to compute because machine learning models are largely algorithmic based. Most do not utilize frequentist statistical measures such as p-values, nor do they focus on effect sizes, two concepts central to the traditional calculation of sample sizes.

For unsupervised models, sample sizes should be based on the research question and inherent variability of the data under consideration. Because unsupervised models can be used for hypothesis generation or data reduction, sample size calculations may be unnecessary. For example, very small data sets (13 unique individuals) have been used to detect clusters of patients with similar inflammatory markers among hospitalized patients with pneumonia (70).

In short, many loose recommendations and heuristics are available for machine learning sample size estimation. Publications in machine learning sample size estimations are often specific to a discipline. For example, genetic epidemiology may require smaller data sets (e.g., <100 rows) (12) as compared with a moderate size (several hundred cases) necessary for a behavioral/cognition outcome evaluating functional magnetic resonance imaging (fMRI) (11). However, it is important to note that the optimal sample size is dependent on the data available and is based on the number of rows and the number and quality of features. If features included are redundant or not predictive of the label, the model may be inaccurate regardless of the volume of features. Furthermore, if there are many features but few instances of the label, then models may have difficulty matching patterns to the label for the full feature space and the model will be unlikely to function appropriately in production (29). Much like any modeling scheme, results can be generated regardless of the sample size. In machine learning, for the results to be accurate and generalizable, the overall sample size needed should be carefully considered a priori and may be much larger than anticipated (65).

Feature Selection

Parsimony is a central tenet of regression model building in epidemiology to prevent overfitting. Selecting the relevant predictors with the appropriate level of explanation is critical to the model's success. Like regression modeling, overfitting is a major concern in machine learning modeling; parsimony is accomplished through feature selection in which the features in data sets are thoughtfully chosen for the model. This approach is especially important for machine learning models because they are applied to data sets that were collected for reasons other than a specific hypothesis, as is the case with electronic medical record data (32) or genetic epidemiology data (40). These types of data sets typically contain a large number of features compared to regression models; many of them are irrelevant to the model being constructed. One example outlines a technique to detect and limit the set of variables to be used for modeling (known as the feature set) in antigen discovery for vaccinology (18), which could be applied to any ‘omics data set.

The difficulty in the selection of a parsimonious feature set is more complex than just the feature set's impact on a particular study outcome (9). There are many other reasons to reduce the number of features in one's training data set prior to using a machine learning model. First, the model will train faster, which is particularly attractive with complex modeling schemes under local computation as opposed to cluster computing. Second, reducing the number of redundant features or features that do not affect the outcome may decrease the likelihood of overfitting the model.

Feature selection can be done in numerous ways including selecting clinically meaningful features, simple correlations between features, and feature importance scores. Other machine learning models such as least absolute shrinkage and selection operator (LASSO) regression may also be useful for feature selection (62, 69). Genetic algorithms for feature selection have become popular and have been used for various purposes (63), including to understand the impact of uncontrolled comorbidities on clinical outcomes in hospitalized patients with pneumonia (3). Regardless of the method used for feature selection, investigators have suggested that the accuracy and stability of the model should be considered when using feature selection algorithms (21). Otherwise, these models risk overfitting.

In the era of ‘omics data, the feature set provided to epidemiologists for analytics has expanded substantially (22). Many approaches for selecting features have been proposed (24, 25). One example is ranked guided iterative feature elimination (RGIFE), which shows promise for identifying enhanced clinically relevant biomarkers (26) (see the sidebar titled Ranked Guided Iterative Feature Elimination for Feature Selection). Regardless, much like in explanatory regression model building, strict automation of feature selection is likely not an appropriate solution on its own. In nearly all areas, domain experts should be enlisted to assist in feature selection for meaningful models to be developed.

RANKED GUIDED ITERATIVE FEATURE ELIMINATION FOR FEATURE SELECTION

Issue: Data sets with a large number of features are difficult to use because relevant features are difficult to identify.

Solution: Ranked guided iterative feature elimination (RGIFE), an algorithm that uses cross-validation to identify relevant features in classification scenarios, is proposed. RGIFE first estimates the performance of a model with the original feature set with k-fold cross-validation. The model then ranks the importance of the features to the classification task. From there the model removes attributes from the end of the feature rank (lowest ranking features) and runs the model again. Reduced feature sets that perform within a tolerable level are accepted until the performance of the model fails below a specified threshold. The performance of this feature selection model was compared with several commonly used feature selection methods.

Conclusion: RGIFE provided similar prediction performance with few features for several cancer-related data sets.

Feature Engineering

Creating or engineering new features from available data to capture latent effects is another important facet of machine learning. Historically, feature engineering has been a manual and laborious process, limited by many factors including the mathematical expertise, time available for analysis, and domain knowledge of the study team.

Simple feature engineering, such as taking the logarithm of a continuous variable to change its distribution or aggregating two variables to account for multicollinearity, is sometimes necessary in traditional regression modeling. For example, the performance accuracy of machine learning models to predict early sepsis was improved by multiplying a shock index by age to derive a new feature for regression (19).

Feature engineering is becoming more complex, with the potential to uncover latent effects that would not be accounted for otherwise. One novel automated feature engineering approach is deep feature synthesis, which combines multiple feature transformations and aggregations, of any type or complexity, to create new features (33). Individual variables are precursors to these new deep features, created through primitives, mathematical formulas used to transform or aggregate. Primitives can range from simple to complex, including means, sums, principal components, or even predicted probabilities or error terms from traditional regression models. Aggregations are very useful for longitudinal features, whereas transformations are typically used for time-invariant features.

In our experience, approaches to feature engineering often merge with feature selection, especially for longitudinal data sets. For instance, in clinical epidemiology, when selecting features, investigators often must limit which laboratory values to include in a model because the number of time points and frequency at which laboratory data are collected during a hospital stay vary.

External data sources are becoming important components of accurate machine learning models. Affixing data collected outside the primary data set provides machine learning algorithms additional features from which to learn. In fact, researchers have suggested that different machine learning algorithms are unlikely to provide a substantial improvement in model performance if the same feature set is used for each (19, 25).

External data sets can be combined with primary data sets, including, among others, geographic location, weather data, and aggregated population statistics. For example, area deprivation indices have been used to predict health care outcomes. This deprivation score is an aggregate score developed from US Census data (58), which can be linked to individual-level data to provide some estimates of cluster effects. It has been successfully used to assist in the prediction of hospital readmission (35) and outcomes in hospitalized patients with community-acquired pneumonia (68). Investigators have used machine learning to aggregate Web search and location data, linked with restaurant data, to identify potentially unsafe restaurants (51). Aggregate data such as these have the capability to revolutionize the performance of model predictions in epidemiology.

Missing Data

Missing data, regardless of the mechanism creating the missingness, is an issue across all analytics. Many traditional regression models will drop cases with missing data and run the model. The majority of machine learning models will not run with missing data; therefore, care to ensure data are complete is critical. One solution to this problem is data imputation, a technique to generate reasonable synthetic values at random when data are missing completely. These approaches have the opportunity to reduce error in missingness by accounting for nonlinear relationships in the imputator (42). Examples of machine learning missing data imputers are ripe in the literature, largely basing the models on random forest approaches (11, 55, 60). In epidemiology, variations on this theme have been used to impute missing data in some studies to better define the role of age-mixing patterns in HIV transmission dynamics (5), defining burnout and stress relationships among health care workers (13), health care utilization in patients with spinal cord injuries (49), and treatment completion prediction in patients with rape-onset post-traumatic stress disorder (34).

Classification or Regression?

Another decision to make in machine learning model building is to determine the type of outcome the investigator is interested in predicting. In machine learning, classification models are considered in the context of categorical labels, whereas regression models are used for continuous labels; each model has different ramifications for model building. Nearly all supervised machine learning models can handle both classification and regression problems. In-depth review of common methods used in health research is beyond the scope of this review but can be found elsewhere (71).

Pretraining/Hyperparameter Optimization

All supervised machine learning algorithms have various hyperparameters that should be adjusted in order to provide a valid and accurate prediction. Some examples include the learning rate in neural networks, C and sigma in a support vector machine, or k in the k-nearest neighbor algorithm. The process of adjusting these hyperparameters is called tuning. Although most machine learning models have default values for each hyperparameter, it is worth the effort to optimize these parameters. To tune hyperparameters, a subset of the data is needed. There are many heuristics to determine how much data should be used for tuning these parameters, but there is no consensus. We recommend that ∼50% of the available data be randomly selected for hyperparameter tuning through cross-validation. This is only a generic heuristic and should be modified on the basis of the variation present in the features and outcome.

The rationale for utilizing a large portion of the data for hyperparameter tuning is that the optimal parameters cannot be known before running a model. An invalid model may result if investigators do not provide appropriate values (59). Several approaches to tuning have been described (8, 31). Grid search approaches are also easy to implement and allow for prespecification of a multitude of possible values for many or all the necessary hyperparameters required by the model. The limitation of grid searching for hyperparameters is the computationally intensive computation required. Since the investigator specifies a set of values for several hyperparameters in a grid search, models must be built for all combinations of values. Model tuning is critical and continues to be discussed as a salient concept in epidemiology. Tessmer and colleagues (61) showcase this with respect to improving R0 calculations in infectious disease epidemiology and dynamics.

Another consideration is specifically for classification models. In this context, the pretraining data set may need to be balanced with respect to the class label frequency. Here, the class label with the lowest frequency of cases is termed the minority class, whereas the higher frequency is termed the majority class. For many classification algorithms, having a relatively balanced outcome is critical (what this means is debated, though as close to 50/50 as possible is ideal). Imbalance in the outcome of a model is an issue when evaluating model performance statistics. If one class label has a much higher prevalence than another, predictive accuracy may look good while the model is predicting only the majority class. In this context, downsampling, upsampling, and synthetic minority oversampling technique (SMOTE) (14) sampling of the data are common approaches (see the sidebar titled Synthetic Minority Oversampling Technique for Handling Class Imbalance). Class balancing should be done only after splitting the data and should be independent of both the training and the testing data sets. Balancing methods have been utilized in many areas of epidemiology, including in cancer survivorship prediction (24, 67), groundwater contamination (38), and mesothelioma patients (17). Alghamdi and colleagues (1) used data from the Henry Ford Health system to predict incident diabetes with cardiorespiratory fitness data using SMOTE to balance the outcome.

SYNTHETIC MINORITY OVERSAMPLING TECHNIQUE FOR HANDLING CLASS IMBALANCE

Issue: Class imbalance in the outcome for clinical and epidemiological data sets prevents machine learning algorithms from learning accurately.

Solution: Synthetic minority oversampling technique (SMOTE) is a technique in which the minority class (e.g., the group with the lowest frequency) in a classification problem is oversampled by creating synthetic samples that are similar to actual samples. With this approach, the machine learning algorithm has more examples of the minority class from which to learn. The algorithm can further be combined with undersampling of the majority class to create more balanced outcomes in the data set.

Conclusion: The combination of SMOTE and undersampling performs better than undersampling alone because it focuses learning on the minority class.

Furthermore, continuous data typically should be normalized prior to training to standardize the scale of multiple continuous features and improve computational performance. It is important to normalize the data after splitting the data sets into pretraining, training, and testing. It is not advised to normalize and then split, as data leakage may occur, resulting in aberrant model performance statistics. The goal of the test set is to make it as independent of the other data as possible. If the cases ending up in the test set have features that have been scaled in consideration of some of the data in the training set, leakage will become an issue. Standardization and normalization of continuous data are necessary for many machine learning models. Seligman and colleagues (54) used data standardization approaches to understand social determinants of health in the Health and Retirement Study.

Training

After identification of optimal hyperparameters, the next step is to split the remaining data into training and testing data sets. When defining the proportion of cases to use for a training data set, researchers face many considerations, but there is no proportion that should be deemed always acceptable. When selecting the proportion of cases for a training set, major considerations include (a) number of cases, (b) number of features, and (c) amount of variation in the features. The importance lies with how well one's training data set describes all the possible patterns of data and their potential prediction of the label. In the literature reviewed, 80% of the cases are most often used for training, although this is simply a heuristic and is not evidence based.

Similar to the hyperparameter tuning set described above, training data must be balanced with respect to the outcome for many models in the context of classification. As above, they should be balanced after splitting and be independent of all data in pretraining and testing data sets. Again as described above for hyperparameter tuning, normalization or standardization of continuous variables should be conducted after splitting.

Testing

Once again, testing the performance of the tuned and trained machine learning model in a separate data set (the test set) necessitates a proportion of the total data. The goal is to have a useful representation of real life in the testing data set. One must be careful to ensure that there is no spillover from training data sets. Here, no balancing of the outcome minority class should occur; however, if features are normalized for training, they should also be normalized in the testing set. Here, the normalization factors should be applied from the training set. For example, if a column in the training set is normalized such that the individual value is subtracted from the column mean and divided by the column standard deviation in the training set (a common method of normalization), the mean and standard deviation from the training set should be applied to the values in the testing data set. The rationale for this approach arises when a model is in production. In this scenario, a single row of data (i.e., an individual's data) is supplied to the model for prediction. Here, there would be no other data from which to standardize this individual, other than the training data mean and standard deviation.

Estimating Treatment Effects

Machine learning has traditionally been focused strictly on predictive modeling, at the expense of determining treatment effects. However, causal inference and treatment effect estimation are central considerations for epidemiologists. In machine learning, treatment effects have historically been of little interest because models are created to produce predictions of the future as opposed to direct interpretation of predictor–outcome relationships (e.g., average treatment effects). This approach does not translate to a model that is effective for causal inference of model parameters. However, investigators have developed several methods for estimating treatment effects from machine learning models (20).

Heterogeneous Treatment Effects

Investigators utilizing machine learning approaches have begun to explore heterogeneous treatment effects as opposed to the overall average treatment effects (2, 10, 16, 26, 30, 44). Heterogeneous treatment effects are those that are systematically different within different groups of study subjects, often called conditional average treatment effects. One can think of identifying heterogeneous treatment effects as identifying effect modification; however, exploration of heterogeneous treatment effects can be much more rigorous, comprising multiple features as opposed to just a one-way or two-way interaction term in a regression model. Investigators have developed machine learning models to detect these very specific clusters of individuals who showcase different treatment effects within their cluster of similar individuals. The most prominent example of cluster detection for calculating heterogeneous treatment effects is within causal forest models, a form of random forests that allows for the detection of subgroups of similar individuals who display different predictor–outcome effects (66). These models have been sparsely used in epidemiology; a 2017 example from Baum and colleagues (4) evaluates heterogeneous treatment effects in the Look AHEAD trial, an evaluation of weight loss interventions for reducing cardiovascular complications of type 2 diabetes (see the sidebar titled Heterogeneous Treatment Effects in the Look AHEAD Trial). Other methods are available for identifying these effects in machine learning, including through the use of Bayesian additive regression trees (BART) and artificial neural networks (ANNs). In 2019, Künzel and colleagues (36) presented X-learner, a unified method to calculate heterogeneous treatment effects that allows for computation in the presence of complex distributions of treatment effects.

HETEROGENEOUS TREATMENT EFFECTS IN THE LOOK AHEAD TRIAL

Issue: The Look AHEAD trial found no significant reduction in cardiovascular events in type 2 diabetic patients when undergoing weight loss interventions. Therefore, the average treatment effect of the weight loss intervention was not significantly associated with cardiovascular events in type 2 diabetics.

Solution: Using a causal forest, a type of machine learning algorithm, investigators were able to identify a subset of 75% of patients enrolled, in whom the intervention was significantly associated with reductions in cardiovascular events.

Conclusion: Randomized trials, although providing the highest level of evidence, focus largely on average treatment effects. Given the varied patients enrolled, investigators may be able to identify an ineffective treatment on average. In this setting, there may be subpopulations in whom the treatments are effective or are harmful.

Defining heterogeneous treatment effects can be particularly useful in the case of a negative study, when the results are inconclusive or suggest that the intervention is not effective. In these negative studies, there may be subpopulations or clusters of individuals who will have a different and sometimes clinically meaningful treatment effect (10). Although one can use traditional methods to identify subpopulations through stratified regression modeling or the inclusion of interaction terms, the epidemiologist will run into issues with multiple testing bias, a common pitfall in frequentist statistics. In methods driven by hypothesis testing, each additional test run on the data increases the amount of statistical error present. Therefore, if an epidemiologist wishes to evaluate 10 different variables as providing different effects among the study sample, the level of statistical error increases substantially. These novel machine learning methods do not suffer from this issue because they are algorithmic approaches to defining treatment effects and not focused on hypothesis testing. Therefore, one can evaluate as many variables as desired for defining heterogeneous treatment effects without increasing statistical error. Furthermore, because these methods are not bound by the same assumptions as are frequentist statistical approaches, there is no concern for the increase in statistical error when performing repeated hypothesis tests on the same data (i.e., multiple comparisons bias).

These approaches are very timely as we move toward personalized medicine and personalized health (46). To this end, epidemiologists can have a much better analytical handle on individual-level variations in treatment effects.

Defining Model Performance

Many methods are available to define whether a trained model performs adequately to predict the outcome with little error in the testing data set. Little novelty has been observed in the area of model performance definition; most epidemiologists focus on mean squared error, root mean squared error, accuracy, precision–recall, area under the receiver operating characteristic curves (AUC ROC), and F1 statistics, depending on whether the epidemiologist is modeling continuous or categorical outcomes.

All methods provide some evidence of overall model performance, though none should be considered ideal in any particular circumstance. For example, the AUC in an ROC curve is often used to define model performance. This value can be deceiving if modeling a categorical outcome that is imbalanced. In this case, an acceptable AUC can be obtained even when the model predicts only the majority class and predicts next to zero of the minority class. In this context, the early retrieval rate from the ROC curve or precision–recall curves can be used (52). The early retrieval rate can be obtained from the left-most area (generally 1-specificity of greater than 80%) of the ROC curve. Here, if the model predicts only the majority class, one will see a low AUC because the model has a low sensitivity where the false positive rate is low. This result suggests that the model does not predict positive cases (typically the minority class). Care must be taken, when evaluating any classification model, to ensure that if the labels are not balanced then appropriate measures of model performance are used.

Explaining the Model

Machine learning algorithms are often called black boxes because most of these predictive models are difficult to explain for a single individual. Here, the practitioner is forced to believe the model, without any understanding of the potential for false positives or negatives for an individual prediction. For example, if a classification model that has an 85% balanced accuracy is being used in practice (e.g., accounting for any imbalance), it will still be wrong 15% of the time. If a new data point is supplied to the model and provides a prediction, the practitioner cannot understand whether this particular prediction is more or less likely to be in error. To correct this deficiency, novel approaches such as the local interpretable model-agnostic explanations (LIME) tests have been developed (48) and successfully used in epidemiologic studies. For example, Pereira and colleagues (43) used LIME to unbox a random forest classifier to enhance local interpretation of features in brain lesion research.

FUTURE DIRECTIONS

Deep Learning

ANNs have been used in machine learning for many years. The phrase deep learning has become popular to describe ANNs with many hidden layers. These models, while very complex, are extremely flexible and allow the epidemiologist to include an almost unlimited number of features for classification or regression tasks, and they are very accurate at modeling highly nonlinear relationships. The limitation of these approaches has largely been in hyperparameter selection and training, as large ANNs may be too computationally intensive to run on local machines, reducing their wide adoption. In 2018, Mocanu and colleagues (39) provided an alternative to traditional ANN training, using a sparse evolutionary approach that mimics natural evolution: building models and adding features as long as the model performance improves. Taking a cue from network analytics and graph theory, they provide a solution to the training time limitations without a decrease in model performance.

One novel area of deep learning in clinical epidemiology and medicine is image recognition and computer vision. This has been a broad area of research in the biomedical sciences, with reviews published on the topic (15) and many international competitions devoted to image analytics, such as the Medical Image Computing and Computer Assisted Intervention conference and the ImageCLEF evaluation campaigns. Areas of work range from image segmentation to object tracking and image detection. Evaluations of the work in this area found that many concerns exist with respect to the generalizability of research and competition findings (37). Regardless of these concerns, this area will continue to be important, with applications in epidemiology for many years.

Interventional Machine Learning

The black box issue of many machine learning models imposes some difficulty for the initiation of interventions to reduce the risk of poor health outcomes. We offer two concerns: First, because models do not explicitly provide estimates of the impact of each individual feature on the label of the model, targeting interventions on modifiable risk factors is difficult; and second, if interventions do take place to reduce the risk of a poor health outcome, model performance statistics will be poor without constant retraining. For example, if a model identifies an individual with a high probability of death, but an intervention occurs to reduce the probability and the individual survives, the model's prediction was actually incorrect. The model predicted death, but the individual survived. A workaround could be to influence the model with an indicator for intervention; however, on initial training, this indicator would be zero for all individuals and the model would fail to learn the patterns from it at a reasonable pace.

Novel neural networks have been developed to assist in reducing these issues. A 2019 example from Shickel and colleagues (56) used gated recurrent unit neural networks to identify individuals at risk of in-hospital mortality. This model not only allowed practitioners to model the probability of death longitudinally, but also provided the ability to document the magnitude of how various time points contributed to the prediction. Therefore, the investigators could provide targeted intervention on the basis of the model predictions as well as model changes in the probability over time. Models such as these have a bright future in epidemiology, finally allowing researchers to model many nonlinear relationships in a longitudinal manner while reducing our reliance on blindly obeying what the model predicts.

PUTTING IT ALL TOGETHER AND LEARNING MORE

Machine learning continues to be a burgeoning area in data analytics. Computing software is making it increasingly easier to learn to create, build, tune, and implement machine learning models. Open-source software such as H2O (https://www.h2o.ai) and Keras (https://keras.io) as well as the multitude of pay-for-model platforms are widely available and extremely powerful, many of which limit or eliminate the need for any knowledge of computer programming. Learning from scratch can be a daunting process, though many online and in-person courses are available, as well as programs resulting in Bachelor, Master, and Doctoral degrees in Computer Science, Machine Learning, Data Analytics, and Data Science.

Partnering with new team members is another pathway to the creation of machine learning models for various needs. Various academic areas such as data science, biostatistics, data analytics, business, computer science, and computational biology are helpful places to begin a search for experts. Outside of academia, many businesses and various industries are employing experts in the fields of machine learning. Networking through colleagues and at local, regional, national, and international congresses can build a team of experts quickly.

CONCLUSIONS

In conclusion, machine learning is becoming increasingly popular not only for developing predictive modeling, but also for defining treatment effects in epidemiology. Improving these approaches, explaining risk factors, and producing full-scale production algorithms for rapid prediction and improvements in population health all serve as ripe areas for continued research.

SUMMARY POINTS

1. 

Machine learning is a rapidly advancing area of data analytics.

2. 

Traditional regression models can be used for machine learning needs, though more algorithmic methods are often considered machine learning.

3. 

Development of a machine learning model includes feature selection, feature engineering, dealing with missing data, training the model, tuning the hyperparameters, testing the model, evaluating its performance, and explaining or operationalizing the final trained model in production.

4. 

The complexities of machine learning applications are being greatly reduced with the introduction of open-source machine learning platforms, many of which have a point-and-click interface as opposed to tools that necessitate in-depth knowledge of computer or statistical programming.

disclosure statement

The authors are not aware of any affiliations, memberships, funding, or financial holdings that might be perceived as affecting the objectivity of this review.

literature cited

  • 1. 
    Alghamdi M, Al-Mallah M, Keteyian S, Brawner C, Ehrman J, Sakr S. 2017. Predicting diabetes mellitus using SMOTE and ensemble machine learning approach: The Henry Ford ExercIse Testing (FIT) project. PLOS ONE 12(7):e0179805
    • Crossref
    • Medline
    • Web of Science ®
    • Google Scholar
    Article Location
  • 2. 
    Athey S, Imbens G. 2016. Recursive partitioning for heterogeneous causal effects. PNAS 113(27):7353–60
    • Crossref
    • Medline
    • Web of Science ®
    • Google Scholar
    Article Location
    More AR articles citing this reference

    • Small Steps with Big Data: Using Machine Learning in Energy and Environmental Economics

      Matthew C. Harding1 and Carlos Lamarche21Department of Economics and Department of Statistics, University of California, Irvine, California 92697; email: [email protected]2Department of Economics, Gatton College of Business and Economics, University of Kentucky, Lexington, Kentucky 40506
      Annual Review of Resource Economics Vol. 13: 469 - 488
      • ...including causal forest (Athey & Imbens 2016, Wager & Athey 2018, Athey et al. 2019), ...
      • ...In a series of papers, Athey & Imbens (2016), Wager & Athey (2018), ...
      • ...the conditional average treatment effect is defined as . Athey & Imbens (2016) propose to estimate heterogeneous effects using a causal tree method considering an unconfoundedness assumption, ...
      • ...and 14.  This parameter including the propensity score function can be estimated by ML methods. Athey & Imbens (2016)...
    • Machine Learning for Social Science: An Agnostic Approach

      Justin Grimmer,1 Margaret E. Roberts,2 and Brandon M. Stewart31Department of Political Science and Hoover Institution, Stanford University, Stanford, California 94305, USA; email: [email protected]2Department of Political Science and Halıcıoğlu Data Science Institute, University of California San Diego, La Jolla, California 92093, USA; email: [email protected]3Department of Sociology and Office of Population Research, Princeton University, Princeton, New Jersey 08540, USA; email: [email protected]
      Annual Review of Political Science Vol. 24: 395 - 419
      • ...a growing literature uses machine learning methods to estimate heterogeneous treatment effects: how the effect of a particular intervention differs across characteristics of individuals (Imai & Ratkovic 2013, Athey & Imbens 2016, Grimmer et al. 2017, Wager & Athey 2018, Künzel et al. 2019)....
    • Machine Learning Methods That Economists Should Know About

      Susan Athey1,2,3 and Guido W. Imbens1,2,3,41Graduate School of Business, Stanford University, Stanford, California 94305, USA; email: [email protected], [email protected]2Stanford Institute for Economic Policy Research, Stanford University, Stanford, California 94305, USA3National Bureau of Economic Research, Cambridge, Massachusetts 02138, USA4Department of Economics, Stanford University, Stanford, California 94305, USA
      Annual Review of Economics Vol. 11: 685 - 725
      • ...techniques such as sample splitting [using different data to select models than to estimate parameters (e.g., Athey & Imbens 2016, Wager & Athey 2017)] and orthogonalization (e.g., ...
      • ...the leaf sample means in the training data will tend to be more extreme (in the sense of being different from the overall sample mean) than in an independent test set. Athey & Imbens (2016) suggest sample splitting as a way to avoid this issue....
      • ...but they lead to a need to carefully adapt and modify basic regularization methods to address the questions of interest. Athey & Imbens (2016) propose several different possible criteria to use for optimizing splits of the covariate space, ...
      • ...If we define what Athey & Imbens (2016) call the transformed outcome, ...
      • ...Athey & Imbens (2016) build on this insight and propose several different estimators for the relative MSE of estimators for the CATE....
    • Machine Learning for Sociology

      Mario Molina and Filiz GaripDepartment of Sociology, Cornell University, Ithaca, New York 14853, USA; email: [email protected], [email protected]
      Annual Review of Sociology Vol. 45: 27 - 45
      • ...Penalized regression shrinks coefficients toward zero; estimates need to be interpreted with caution (Athey & Imbens 2016)....
      • ...Athey & Imbens (2016) develop causal trees to estimate treatment effects for subgroups....
      • ...and identification of heterogeneous treatment effects in existing experimental data (Athey & Imbens 2016)....
      • ...Economists use SML to uncover heterogeneous treatment effects in experimental data (Athey & Imbens 2016)....
      • ...ability to deal with different kinds of data). Athey & Imbens (2016), Athey (2017), ...
    • Econometric Methods for Program Evaluation

      Alberto Abadie1 and Matias D. Cattaneo21Department of Economics, Massachusetts Institute of Technology, Cambridge, Massachusetts 02142, USA; email: [email protected]2Department of Economics and Department of Statistics, University of Michigan, Ann Arbor, Michigan 48109, USA; email: [email protected]
      Annual Review of Economics Vol. 10: 465 - 503
      • ...e.g., Athey & Imbens 2016, Taddy et al. 2016, Wager & Athey 2018)....
    • Data ex Machina: Introduction to Big Data

      David Lazer1,2 and Jason Radford1,31Department of Political Science and College of Computer and Information Science, Northeastern University, Boston, Massachusetts 02115; email: [email protected], [email protected]2Institute for Quantitative Social Science, Harvard University, Cambridge, Massachusetts 021383Department of Sociology, University of Chicago, Chicago, Illinois 60637
      Annual Review of Sociology Vol. 43: 19 - 39
      • ...Others are developing algorithms that seek to inductively identify groups for whom effects are especially large (Athey & Imbens 2016, Green & Kern 2012, Imai & Ratkovic 2013, Taddy et al. 2016)....

  • 3. 
    Bandhary S, Contreras-Mora BY, Gupta R, Fernandez P, Jimenez P, et al. 2017. Clinical outcomes of community-acquired pneumonia in patients with diabetes mellitus. J. Respir. Infect. 1(1):23–28
    • Crossref
    • Google Scholar
    Article Location
  • 4. 
    Baum A, Scarpa J, Bruzelius E, Tamler R, Basu S, Faghmous J. 2017. Targeting weight loss interventions to reduce cardiovascular complications of type 2 diabetes: a machine learning-based post-hoc analysis of heterogeneous treatment effects in the Look AHEAD trial. Lancet Diabetes Endocrinol. 5(10):808–15
    • Crossref
    • Medline
    • Web of Science ®
    • Google Scholar
    Article Location
  • 5. 
    Beauclair R, Hens N, Delva W. 2018. The role of age-mixing patterns in HIV transmission dynamics: novel hypotheses from a field study in Cape Town, South Africa. Epidemics 25:61–71
    • Crossref
    • Medline
    • Web of Science ®
    • Google Scholar
    Article Location
  • 6. 
    Beleites C, Neugebauer U, Bocklitz T, Krafft C, Popp J. 2013. Sample size planning for classification models. Anal. Chim. Acta 760:25–33
    • Crossref
    • Medline
    • Web of Science ®
    • Google Scholar
    Article Location
  • 7. 
    Bellman R. 2015. Adaptive Control Processes: A Guided Tour. Princeton, NJ: Princeton Univ. Press
    • Google Scholar
    Article Locations:
    • Article Location
    • Article Location
    More AR articles citing this reference

    • Big Data in Public Health: Terminology, Machine Learning, and Privacy

      Stephen J. Mooney1 and Vikas Pejaver21Harborview Injury Prevention and Research Center, University of Washington, Seattle, Washington 98122, USA; email: [email protected]2Department of Biomedical Informatics and Medical Education and the eScience Institute, University of Washington, Seattle, Washington 98109, USA; email: [email protected]
      Annual Review of Public Health Vol. 39: 95 - 112
      • ...Within the data science field, the “curse of dimensionality” (13, p. 94) associated with wide data sets has been somewhat alleviated through the adoption of machine-learning models, ...

  • 8. 
    Bergstra J, Bengio Y. 2012. Random search for hyper-parameter optimization. J. Mach. Learn. Res. 13:281–305
    • Web of Science ®
    • Google Scholar
    Article Location
  • 9. 
    Blum AL, Langley P. 1997. Selection of relevant features and examples in machine learning. Artif. Intell. 97(1):245–71
    • Crossref
    • Web of Science ®
    • Google Scholar
    Article Location
  • 10. 
    Bonetti M, Gelber RD. 2004. Patterns of treatment effects in subsets of patients in clinical trials. Biostatistics 5(3):465–81
    • Crossref
    • Medline
    • Web of Science ®
    • Google Scholar
    Article Locations:
    • Article Location
    • Article Location
  • 11. 
    Breiman L. 2001. Random forests. Mach. Learn. 45:5–32
    • Crossref
    • Web of Science ®
    • Google Scholar
    Article Locations:
    • Article Location
    • Article Location
    More AR articles citing this reference

    • DNA Methylation Profiling: An Emerging Paradigm for Cancer Diagnosis

      Antonios Papanicolau-Sengos and Kenneth AldapeLaboratory of Pathology, National Cancer Institute, Bethesda, Maryland 20892, USA; email: [email protected], [email protected]
      Annual Review of Pathology: Mechanisms of Disease Vol. 17: 295 - 321
      • ...and the final class is defined as the most frequently voted output (38)....
    • Machine Learning for the Study of Plankton and Marine Snow from Images

      Jean-Olivier Irisson,1 Sakina-Dorothée Ayata,1 Dhugal J. Lindsay,2 Lee Karp-Boss,3 and Lars Stemmann11Laboratoire d'Océanographie de Villefranche, Sorbonne Université, CNRS, F-06230 Villefranche-sur-Mer, France; email: [email protected], [email protected], [email protected]2Advanced Science-Technology Research (ASTER) Program, Institute for Extra-Cutting-Edge Science and Technology Avant-Garde Research (X-STAR), Japan Agency for Marine-Earth Science and Technology, Yokosuka, Kanagawa 237-0021, Japan; email: [email protected]3School of Marine Sciences, University of Maine, Orono, Maine 04469, USA; email: [email protected]
      Annual Review of Marine Science Vol. 14: 277 - 301
      • ...such as a support vector machine (Cortes & Vapnik 1995) or a random forest (RF) (Breiman 2001), ...
    • Small Steps with Big Data: Using Machine Learning in Energy and Environmental Economics

      Matthew C. Harding1 and Carlos Lamarche21Department of Economics and Department of Statistics, University of California, Irvine, California 92697; email: [email protected]2Department of Economics, Gatton College of Business and Economics, University of Kentucky, Lexington, Kentucky 40506
      Annual Review of Resource Economics Vol. 13: 469 - 488
      • ...regularization approaches are needed to avoid issues. Breiman (1996, 2001) develops a series of approaches that lead to improvements in the performance of tree methods....
      • ...The random forest introduced by Breiman (2001) builds on a large number of trees that are formed based on a bootstrap sample of the original sample (bagging, ...
    • Generating and Using Transcriptomically Based Retinal Cell Atlases

      Karthik Shekhar1 and Joshua R. Sanes21Department of Chemical and Biomolecular Engineering; Helen Wills Neuroscience Institute; and California Institute for Quantitative Biosciences, QB3, University of California, Berkeley, California 94720, USA; email: [email protected]2Center for Brain Science and Department of Molecular and Cell Biology, Harvard University, Cambridge, Massachusetts 02138, USA; email: [email protected]
      Annual Review of Vision Science Vol. 7: 43 - 72
      • ...This comparison is frequently performed using multiclass supervised classification approaches such as Random Forest and XGBoost (Breiman 2001, Chen & Guestrin 2016)....
    • Artificial Intelligence, Predictive Policing, and Risk Assessment for Law Enforcement

      Richard A. BerkDepartments of Statistics and Criminology, University of Pennsylvania, Philadelphia, Pennsylvania 19104, USA; email: [email protected]
      Annual Review of Criminology Vol. 4: 209 - 237
      • ...Perhaps the two most important and effective ML methods used for criminal justice risk assessment are random forests (Breiman 2001a)...
      • ...One simple approach computes the decline in accuracy when each predictor in turn is precluded from affecting the fitted values (Breiman 2001a)....
    • Machine Learning in Materials Discovery: Confirmed Predictions and Their Underlying Approaches

      James E. Saal,1 Anton O. Oliynyk,2 and Bryce Meredig11Citrine Informatics, Redwood City, California 94063, USA; email: [email protected]2Department of Chemistry and Biochemistry, Manhattan College, Riverdale, New York 10471, USA
      Annual Review of Materials Research Vol. 50: 49 - 69
      • ...The popular random forest (RF) (59, 60) and (deep) neural network (NN) (61, 62)...
    • Sparse Data–Driven Learning for Effective and Efficient Biomedical Image Segmentation

      John A. Onofrey,1,2 Lawrence H. Staib,1,3 Xiaojie Huang,1,6 Fan Zhang,1 Xenophon Papademetris,1,3 Dimitris Metaxas,4 Daniel Rueckert,5 and James S. Duncan1,31Department of Radiology and Biomedical Imaging, Yale School of Medicine, New Haven, Connecticut 06520, USA; email: [email protected]2Department of Urology, Yale School of Medicine, New Haven, Connecticut 06520, USA3Department of Biomedical Engineering, Yale University, New Haven, Connecticut 06520, USA; email: [email protected]4Department of Computer Science, Rutgers University, Piscataway, New Jersey 08854, USA5Department of Computing, Imperial College London, London SW7 2AZ, United Kingdom6Citadel Securities, Chicago, Illinois 60603, USA
      Annual Review of Biomedical Engineering Vol. 22: 127 - 153
      • ...segmentation methods use one or more of these appearance features in conjunction with classification methods such as support vector machines (26), random forests (27), ...
    • Robust Small Area Estimation: An Overview

      Jiming Jiang1 and J. Sunil Rao21Department of Statistics, University of California, Davis, California 95616, USA; email: [email protected]2Department of Public Health Sciences, University of Miami, Miami, Florida 33136, USA
      Annual Review of Statistics and Its Application Vol. 7: 337 - 360
      • ...They then went a step further and proposed random forests (RFs) (Brieman 2001) for SAE, ...
    • Big Data in Industrial-Organizational Psychology and Human Resource Management: Forward Progress for Organizational Research and Practice

      Frederick L. Oswald,1 Tara S. Behrend,2 Dan J. Putka,3 and Evan Sinar41Department of Psychological Sciences, Rice University, Houston, Texas 77005, USA; email: [email protected]2Department of Organizational Sciences and Communication, George Washington University, Washington, DC 20052, USA3Human Resources Research Organization, Alexandria, Virginia 22314, USA4BetterUp, Pittsburgh, Pennsylvania 15243, USA
      Annual Review of Organizational Psychology and Organizational Behavior Vol. 7: 505 - 533
      • ...The next step up in complexity for exploring both nonlinearities and interactions involves methods that extend traditional classification and regression tree (CART) models (Breiman et al. 1984), such as random forests (Breiman 2001a)...
    • Big Data and Artificial Intelligence Modeling for Drug Discovery

      Hao ZhuDepartment of Chemistry and Center for Computational and Integrative Biology, Rutgers University, Camden, New Jersey 08102, USA; email: [email protected]
      Annual Review of Pharmacology and Toxicology Vol. 60: 573 - 589
      • ...which were developed based on nonlinear modeling algorithms such as k-nearest neighbors (77), support vector machines (78), and random forest (79, 80), ...
    • Machine Learning for Fluid Mechanics

      Steven L. Brunton,1 Bernd R. Noack,2,3 and Petros Koumoutsakos41Department of Mechanical Engineering, University of Washington, Seattle, Washington 98195, USA2LIMSI (Laboratoire d'Informatique pour la Mécanique et les Sciences de l'Ingénieur), CNRS UPR 3251, Université Paris-Saclay, F-91403 Orsay, France3Institut für Strömungsmechanik und Technische Akustik, Technische Universität Berlin, D-10634 Berlin, Germany4Computational Science and Engineering Laboratory, ETH Zurich, CH-8092 Zurich, Switzerland; email: [email protected]
      Annual Review of Fluid Mechanics Vol. 52: 477 - 508
      • ...Two fundamental classification algorithms are support vector machines (SVMs) (Schölkopf & Smola 2002) and random forests (Breiman 2001), ...
    • Machine Learning Methods That Economists Should Know About

      Susan Athey1,2,3 and Guido W. Imbens1,2,3,41Graduate School of Business, Stanford University, Stanford, California 94305, USA; email: [email protected], [email protected]2Stanford Institute for Economic Policy Research, Stanford University, Stanford, California 94305, USA3National Bureau of Economic Research, Cambridge, Massachusetts 02138, USA4Department of Economics, Stanford University, Stanford, California 94305, USA
      Annual Review of Economics Vol. 11: 685 - 725
      • ...Regression trees (Breiman et al. 1984) and their extension, random forests (Breiman 2001a), ...
      • ...For better estimates of , random forests (Breiman 2001a) build on the regression tree algorithm....
    • Machine Learning for Sociology

      Mario Molina and Filiz GaripDepartment of Sociology, Cornell University, Ithaca, New York 14853, USA; email: [email protected], [email protected]
      Annual Review of Sociology Vol. 45: 27 - 45
      • ...A version called random forests averages over multiple trees (Breiman 2001a), ...
      • ...and specify a parametric (typically linear) model to relate the inputs to an output (Breiman 2001a, Donoho 2017)....
    • Discovering Pathway and Cell Type Signatures in Transcriptomic Compendia with Machine Learning

      Gregory P. Way1,2 and Casey S. Greene21Genomics and Computational Biology Graduate Group, Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania 19104, USA2Department of Systems Pharmacology and Translational Therapeutics, University of Pennsylvania, Philadelphia, Pennsylvania 19104, USA; email: [email protected]
      Annual Review of Biomedical Data Science Vol. 2: 1 - 17
      • ...and RFs will determine over many iterations features used to split samples based on information content (21, 22)....
    • Using Statistics to Assess Lethal Violence in Civil and Inter-State War

      Patrick Ball and Megan PriceHuman Rights Data Analysis Group, San Francisco, California 94110, USA; email: [email protected]
      Annual Review of Statistics and Its Application Vol. 6: 63 - 84
      • ...we chose binary classification algorithms that perform better than alternating decision trees [e.g., gradient boosting (Chen & Guestrin 2016), random forests (Breiman 2001), ...
    • Personalized Cancer Genomics

      Richard M. SimonR Simon Consulting, Potomac, Maryland 20854, USA; email: [email protected]
      Annual Review of Statistics and Its Application Vol. 5: 169 - 182
      • ...partial least squares discriminant analysis (Nguyen & Rocke 2002, Boulesteix & Strimmer 2007), random forests (Breiman 2001), ...
    • Well-Being Dynamics and Poverty Traps

      Christopher B. Barrett,1 Teevrat Garg,2,3 and Linden McBride11Charles H. Dyson School of Applied Economics and Management, Cornell University, Ithaca, New York 14853; email: [email protected], [email protected]2Grantham Research Institute on Climate Change and Environment, London School of Economics, London, WC2A 2AE, United Kingdom3School of Global Policy and Strategy, University of California, San Diego, La Jolla, California 92093; email: [email protected]
      Annual Review of Resource Economics Vol. 8: 303 - 327
      • ...One alternative nonparametric approach involves using machine learning methods such as classification and regression trees (Breiman 2001, Loh 2002), ...
    • Advances and Challenges in Genomic Selection for Disease Resistance

      Jesse Poland1 and Jessica Rutkoski2,31Wheat Genetics Resource Center, Department of Plant Pathology and Department of Agronomy, Kansas State University, Manhattan, Kansas, 66506; email: [email protected]2Plant Breeding and Genetics Section, Cornell University, Ithaca, New York, 14853; email: [email protected]3International Maize and Wheat Research Center (CIMMYT), Texcoco, Estado de México, 56237 Mexico
      Annual Review of Phytopathology Vol. 54: 79 - 98
      • ...Two such models are Reproducing Kernel Hilbert Space (RKHS) (21) and Random Forest (RF) (9)....
    • League Tables for Hospital Comparisons

      Sharon-Lise T. Normand,1 Arlene S. Ash,2 Stephen E. Fienberg,3 Thérèse A. Stukel,4 Jessica Utts,5 and Thomas A. Louis61Department of Health Care Policy, Harvard Medical School, and Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, Massachusetts 02115; email: [email protected]2Department of Quantitative Health Sciences, University of Massachusetts Medical School, Worcester, Massachusetts 01605; email: [email protected]3Department of Statistics, Carnegie Mellon University, Pittsburgh, Pennsylvania 15213; email: [email protected]4Institute for Clinical Evaluative Sciences, Toronto, Ontario M4N 3M5, Canada, and the Institute of Health Policy, Management & Evaluation, University of Toronto, Toronto, Ontario M5T 3M6, Canada, and Dartmouth Institute for Health Policy and Clinical Practice, Hanover, New Hampshire 03766; email: [email protected]5Department of Statistics, University of California, Irvine, California 92697; email: [email protected]6Department of Biostatistics, Johns Hopkins Bloomberg School of Public Health, Baltimore, Maryland 21205; email: [email protected]
      Annual Review of Statistics and Its Application Vol. 3: 21 - 50
      • ...and boosting (Berk 2008, Breiman 2001, Hastie et al. 2009, McCaffrey et al. 2004), ...
    • Computerized Adaptive Diagnosis and Testing of Mental Health Disorders

      Robert D. Gibbons,1 David J. Weiss,2 Ellen Frank,3 and David Kupfer31Center for Health Statistics and Departments of Medicine and Public Health Sciences, University of Chicago, Chicago, Illinois 60612; email: [email protected]2Department of Psychology, University of Minnesota, Minneapolis, Minnesota 554553Department of Psychiatry and Western Psychiatric Institute and Clinic, University of Pittsburgh, Pittsburgh, Pennsylvania 15213
      Annual Review of Clinical Psychology Vol. 12: 83 - 104
      • ...a Diagnostic and Statistical Manual of Mental Disorders (DSM) diagnosis of major depressive disorder (MDD)]. Decision trees (Brieman 2001, Brieman et al. 1984, Quinlan 1993) represent an attractive framework for designing adaptive predictive tests because their corresponding models can be represented as a sequence of binary decisions....
      • ...Decision trees (Brieman 2001, Brieman et al. 1984, Quinlan 1993) represent a model in terms of a flow chart....
      • ...ensemble models constructed of averages of hundreds of decision trees have received considerable attention in statistics and machine learning (Brieman 1996, 2001...
      • ...Random forests require minimal human intervention and have historically exhibited good performance across a wide range of domains (Brieman 2001, Hastie et al. 2009)....
    • Diboson Production at Colliders

      Mark S. NeubauerDepartment of Physics, University of Illinois at Urbana-Champaign, Urbana, Illinois 61801; email: [email protected]
      Annual Review of Nuclear and Particle Science Vol. 61: 223 - 250
      • ...dijet mass) that can sensitively distinguish between signal and background as input to a random forest (75) multivariate event classifier. Figure 9 shows the dijet mass distribution obtained from the results of the random forest output fit....

  • 12. 
    Breiman L. 2001. Statistical modeling: the two cultures. Stat. Sci. 16(3):199–215
    • Crossref
    • Web of Science ®
    • Google Scholar
    Article Locations:
    • Article Location
    • Article Location
    • Article Location
    More AR articles citing this reference

    • Machine Learning for Sustainable Energy Systems

      Priya L. Donti1,2 and J. Zico Kolter1,31Computer Science Department, Carnegie Mellon University, Pittsburgh, Pennsylvania 15213, USA; email: [email protected]2Department of Engineering and Public Policy, Carnegie Mellon University, Pittsburgh, Pennsylvania 15213, USA3Bosch Center for Artificial Intelligence, Pittsburgh, Pennsylvania 15222, USA
      Annual Review of Environment and Resources Vol. 46:
      • ...The difference between these two fields is largely one of perspective (16), ...
    • Governance by Data

      Fleur JohnsFaculty of Law & Justice, University of New South Wales (UNSW) Sydney, New South Wales 2052, Australia; email: [email protected]
      Annual Review of Law and Social Science Vol. 17: 53 - 71
      • ...Statistics typically assumes that data are generated by a given stochastic data model and seeks to fill out the parameters of that model (Breiman 2001)....
    • Algorithms and Decision-Making in the Public Sector

      Karen Levy, Kyla E. Chasalow, and Sarah RileyDepartment of Information Science, Cornell University, Ithaca, New York 14853, USA; email: [email protected]
      Annual Review of Law and Social Science Vol. 17: 309 - 334
      • ...Those data are also processed in new ways: Machine learning techniques are characterized by a focus on developing models for prediction rather than explanation (Breiman 2001, Hofman et al. 2017, Kleinberg et al. 2015) and, ...
    • Machine Learning for Social Science: An Agnostic Approach

      Justin Grimmer,1 Margaret E. Roberts,2 and Brandon M. Stewart31Department of Political Science and Hoover Institution, Stanford University, Stanford, California 94305, USA; email: [email protected]2Department of Political Science and Halıcıoğlu Data Science Institute, University of California San Diego, La Jolla, California 92093, USA; email: [email protected]3Department of Sociology and Office of Population Research, Princeton University, Princeton, New Jersey 08540, USA; email: [email protected]
      Annual Review of Political Science Vol. 24: 395 - 419
      • ...We argue that machine learning is as much a culture defined by a distinct set of values and tools as it is a set of algorithms. Breiman (2001) made a similar point 20 years ago in his seminal piece “Statistical Modeling: The Two Cultures,” which drew a contrast between stochastic data-generating process modeling and algorithmic modeling cultures....
    • Flexible Models for Complex Data with Applications

      Christophe Ley,1 Slađana Babić,1,2 and Domien Craens11Department of Applied Mathematics, Computer Science and Statistics, Ghent University, Gent, B-9000 Belgium; email: [email protected]2Vlerick Business School, B-1210 Brussels, Belgium
      Annual Review of Statistics and Its Application Vol. 8: 369 - 391
      • ...An influential paper reflecting generally on the culture of data modeling via a stochastic model versus algorithmic modeling is that of Breiman (2001)....
    • Artificial Intelligence, Predictive Policing, and Risk Assessment for Law Enforcement

      Richard A. BerkDepartments of Statistics and Criminology, University of Pennsylvania, Philadelphia, Pennsylvania 19104, USA; email: [email protected]
      Annual Review of Criminology Vol. 4: 209 - 237
      • ...Algorithms should not be confused with models because a range of errors can easily follow (Breiman 2001b)....
    • Big Data in Industrial-Organizational Psychology and Human Resource Management: Forward Progress for Organizational Research and Practice

      Frederick L. Oswald,1 Tara S. Behrend,2 Dan J. Putka,3 and Evan Sinar41Department of Psychological Sciences, Rice University, Houston, Texas 77005, USA; email: [email protected]2Department of Organizational Sciences and Communication, George Washington University, Washington, DC 20052, USA3Human Resources Research Organization, Alexandria, Virginia 22314, USA4BetterUp, Pittsburgh, Pennsylvania 15243, USA
      Annual Review of Organizational Psychology and Organizational Behavior Vol. 7: 505 - 533
      • ...This is exactly what distinguishes big data's algorithm-driven approach from the traditional model-driven approach (Breiman 2001b)....
    • Machine Learning Methods That Economists Should Know About

      Susan Athey1,2,3 and Guido W. Imbens1,2,3,41Graduate School of Business, Stanford University, Stanford, California 94305, USA; email: [email protected], [email protected]2Stanford Institute for Economic Policy Research, Stanford University, Stanford, California 94305, USA3National Bureau of Economic Research, Cambridge, Massachusetts 02138, USA4Department of Economics, Stanford University, Stanford, California 94305, USA
      Annual Review of Economics Vol. 11: 685 - 725
      • ...In the abstract of his provocative 2001 paper in Statistical Science, the Berkeley statistician Leo Breiman (2001b, ...
      • ...Breiman (2001b, p. 199) goes on to claim that,...
      • ...Breiman's (2001b) characterization no longer applies to the field of statistics....
    • Machine Learning for Sociology

      Mario Molina and Filiz GaripDepartment of Sociology, Cornell University, Ithaca, New York 14853, USA; email: [email protected], [email protected]
      Annual Review of Sociology Vol. 45: 27 - 45
      • ...Breiman (2001b) describes two cultures of statistical analysis: data modeling and algorithmic modeling....
    • The Challenge of Big Data and Data Science

      Henry E. BradyDepartment of Political Science and Goldman School of Public Policy, University of California, Berkeley, California 94720, USA; email: [email protected]
      Annual Review of Political Science Vol. 22: 297 - 323
      • ...and it is at the core of many textbooks on machine learning. Breiman (2001) presents an elegant, ...
    • Statistical Models of Key Components of Wildfire Risk

      Dexen D.Z. Xi,1 Stephen W. Taylor,2 Douglas G. Woolford,1 and C.B. Dean31Department of Statistical and Actuarial Sciences, University of Western Ontario, London, Ontario N6A 5B7, Canada; email: [email protected], [email protected]2Pacific Forestry Centre, Natural Resources Canada, Victoria, British Columbia V8Z 1M5, Canada; email: [email protected]3Department of Statistics and Actuarial Science, University of Waterloo, Waterloo, Ontario N2L 3G1, Canada; email: [email protected]
      Annual Review of Statistics and Its Application Vol. 6: 197 - 222
      • ...Breiman (2001) noted that the objective of a statistical analysis is to use data to make inferences, ...
    • Approximate Bayesian Computation

      Mark A. BeaumontSchool of Biological Sciences, University of Bristol, Bristol BS8 1TQ, United Kingdom; email: [email protected]
      Annual Review of Statistics and Its Application Vol. 6: 379 - 403
      • ...Simulations from implementations of generative models have increasingly been used to give training data sets for supervised machine learning purposes, potentially bridging the two cultures of Breiman (2001)....
    • Curriculum Guidelines for Undergraduate Programs in Data Science

      Richard D. De Veaux,1 Mahesh Agarwal,2 Maia Averett,3 Benjamin S. Baumer,4 Andrew Bray,5 Thomas C. Bressoud,6 Lance Bryant,7 Lei Z. Cheng,8 Amanda Francis,9 Robert Gould,10 Albert Y. Kim,11 Matt Kretchmar,12 Qin Lu,13 Ann Moskol,14 Deborah Nolan,15 Roberto Pelayo,16 Sean Raleigh,17 Ricky J. Sethi,18 Mutiara Sondjaja,19 Neelesh Tiruviluamala,20 Paul X. Uhlig,21 Talitha M. Washington,22 Curtis L. Wesley,23 David White,24 and Ping Ye251Department of Mathematics and Statistics, Williams College, Williamstown, Massachusetts 012672Department of Mathematics and Statistics, University of Michigan, Dearborn, Michigan 48128-24063Department of Mathematics and Computer Science, Mills College, Oakland, California 946134Department of Statistical & Data Sciences, Smith College, Northampton, Massachusetts 010635Department of Mathematics, Reed College, Portland, Oregon 972026Department of Mathematics and Computer Science, Denison University, Granville, Ohio 430237Department of Mathematics, Shippensburg University, Shippensburg, Pennsylvania 172578Department of Mathematics, Olivet Nazarene University, Bourbonnais, Illinois 609149Department of Mathematics, Brigham Young University, Provo, Utah 8460110Department of Statistics, University of California, Los Angeles, Los Angeles, California 90095-155411Department of Mathematics, Middlebury College, Middlebury, Vermont 0575312Department of Mathematics and Computer Science, Denison University, Granville, Ohio 4302313Department of Mathematics, Lafayette College, Easton, Pennsylvania 18042-178014Department of Mathematics and Computer Science, Rhode Island College, Providence, Rhode Island 0290815Department of Statistics, University of California, Berkeley, California 9472016Department of Mathematics, University of Hawaii, Hilo, Hawaii 96720-409117Department of Mathematics, Westminster College, Salt Lake City, Utah 8410518Department of Computer Science, Fitchburg State University, Fitchburg, Massachusetts 0142019Department of Mathematics, New York University, New York, New York 1001220Department of Mathematics, University of Southern California, Los Angeles, California 9008921Department of Mathematics, St. Mary's University, San Antonio, Texas 7822822Department of Mathematics, Howard University, Washington, DC 2005923Department of Mathematics, LeTourneau University, Longview, Texas 7560224Department of Mathematics and Computer Science, Denison University, Granville, Ohio 4302325Department of Mathematics, University of North Georgia, Oakwood, Georgia 30566
      Annual Review of Statistics and Its Application Vol. 4: 15 - 30
      • ...Breiman (2001) spoke of the two cultures of algorithmic (computational) and data (statistical) models (renamed “predictive” and “inferential,” respectively, ...
    • There Is Individualized Treatment. Why Not Individualized Inference?

      Keli Liu1 and Xiao-Li Meng21Department of Statistics, Stanford University, Stanford, California 94305; email: [email protected]2Department of Statistics, Harvard University, Cambridge, Massachusetts 02138; email: [email protected]
      Annual Review of Statistics and Its Application Vol. 3: 79 - 111
      • ...Readers are referred to Lehmann (1990), Breiman (2001), and Hansen & Yu (2001)...
    • Methods and Global Environmental Governance

      Kate O'Neill,1 Erika Weinthal,2 Kimberly R. Marion Suiseeya,2 Steven Bernstein,3 Avery Cohn,4 Michael W. Stone,5 and Benjamin Cashore51Department of Environmental Science, Policy, and Management, University of California at Berkeley, California 94720; email: [email protected]2Nicholas School of the Environment, Duke University, Durham, North Carolina 27708; email: [email protected], [email protected]3Department of Political Science, University of Toronto, Toronto, Canada M5S 3G3; email: [email protected]4National Center for Atmospheric Research, Boulder, Colorado 80385; email: [email protected]5School of Forestry and Environmental Studies, Yale University, New Haven, Connecticut, 06511; email: [email protected], [email protected]
      Annual Review of Environment and Resources Vol. 38: 441 - 471
      • ...Modeling activities may be fully inductive, fully deductive, and anywhere in between (152)....

  • 13. 
    Büssing A, Falkenberg Z, Schoppe C, Recchia DR, Poier D. 2017. Work stress associated cool down reactions among nurses and hospital physicians and their relation to burnout symptoms. BMC Health Serv. Res. 17(1):551
    • Crossref
    • Medline
    • Web of Science ®
    • Google Scholar
    Article Location
  • 14. 
    Chawla NV, Bowyer KW, Hall LO, Kegelmeyer WP. 2002. SMOTE: Synthetic Minority Over-Sampling Technique. J. Artif. Intell. Res. 16:321–57
    • Crossref
    • Web of Science ®
    • Google Scholar
    Article Location
  • 15. 
    Chen W, Li W, Dong X, Pei J. 2018. A review of biological image analysis. Curr. Bioinform. 13:337–43
    • Crossref
    • Web of Science ®
    • Google Scholar
    Article Location
  • 16. 
    Chernozhukov V, Chetverikov D, Demirer M, Duflo E, Hansen C, et al. 2016. Double/debiased machine learning for treatment and causal parameters. arXiv:160800060 [Econ. Stat.]
    • Google Scholar
    Article Location
  • 17. 
    Chicco D, Rovelli C. 2019. Computational prediction of diagnosis and feature selection on mesothelioma patient health records. PLOS ONE 14(1):e0208737
    • Crossref
    • Medline
    • Web of Science ®
    • Google Scholar
    Article Location
  • 18. 
    De La Fuente J, Villar M, Estrada-Peña A, Olivas JA. 2018. High throughput discovery and characterization of tick and pathogen vaccine protective antigens using vaccinomics with intelligent Big Data analytic techniques. Expert Rev. Vaccines 17(7):569–76
    • Crossref
    • Medline
    • Web of Science ®
    • Google Scholar
    Article Location
  • 19. 
    Delahanty RJ, Alvarez J, Flynn LM, Sherwin RL, Jones SS. 2019. Development and evaluation of a machine learning model for the early identification of patients at risk for sepsis. Ann. Emerg. Med. 73:334–44
    • Crossref
    • Medline
    • Web of Science ®
    • Google Scholar
    Article Locations:
    • Article Location
    • Article Location
  • 20. 
    Fang G, Annis IE, Elson-Lafata J, Cykert S. 2019. Applying machine learning to predict real-world individual treatment effects: insights from a virtual patient cohort. J. Am. Med. Inf. Assoc. 26(10):977–88
    • Crossref
    • Medline
    • Web of Science ®
    • Google Scholar
    Article Location
  • 21. 
    Figueroa RL, Zeng-Treitler Q, Kandula S, Ngo LH. 2012. Predicting sample size required for classification performance. BMC Med. Inform. Decis. Mak. 12(1):8
    • Crossref
    • Medline
    • Web of Science ®
    • Google Scholar
    Article Locations:
    • Article Location
    • Article Location
  • 22. 
    Flaxman AD, Vos T. 2018. Machine learning in population health: opportunities and threats. PLOS Med. 15(11):e1002702
    • Crossref
    • Medline
    • Web of Science ®
    • Google Scholar
    Article Locations:
    • Article Location
    • Article Location
  • 23. 
    Forbes. 2018. The rise in computing power: why ubiquitous artificial intelligence is now a reality. Forbes, July 17. https://www.forbes.com/sites/intelai/2018/07/17/the-rise-in-computing-power-why-ubiquitous-artificial-intelligence-is-now-a-reality/#22a73011d3f3
    • Google Scholar
    Article Location
  • 24. 
    Fotouhi S, Asadi S, Kattan MW. 2019. A comprehensive data level analysis for cancer diagnosis on imbalanced data. J. Biomed. Inform. 90:103089
    • Crossref
    • Medline
    • Web of Science ®
    • Google Scholar
    Article Locations:
    • Article Location
    • Article Location
  • 25. 
    Frérot M, Lefebvre A, Aho S, Callier P, Astruc K, Aho Glélé LS. 2018. What is epidemiology? Changing definitions of epidemiology 1978–2017. PLOS ONE 13(12):e0208442
    • Crossref
    • Medline
    • Web of Science ®
    • Google Scholar
    Article Locations:
    • Article Location
    • Article Location
    • Article Location
  • 26. 
    Green DP, Kern HL. 2012. Modeling heterogeneous treatment effects in survey experiments with Bayesian additive regression trees. Public Opin. Q. 76(3):491–511
    • Crossref
    • Web of Science ®
    • Google Scholar
    Article Locations:
    • Article Location
    • Article Location
    More AR articles citing this reference

    • Bayesian Additive Regression Trees: A Review and Look Forward

      Jennifer Hill,1 Antonio Linero,2 and Jared Murray31Department of Applied Statistics, Social Science, and Humanities, New York University, New York, NY 10003, USA; email: [email protected]2Department of Statistics and Data Sciences, University of Texas, Austin, Texas 78712, USA3Department of Information, Risk, and Operations Management, McCombs School of Business, University of Texas, Austin, Texas 78712, USA
      Annual Review of Statistics and Its Application Vol. 7: 251 - 278
      • ... was the first paper to describe the advantages of using BART for causal inference estimation over common alternatives in the causal inference literature. Hill et al. (2011), Green & Kern (2012), Kern et al. (2016), Hahn et al. (2017), ...
      • ...Green & Kern (2012) present partial dependence plots (Friedman 2001), which are generated by estimating and averaging counterfactual treatment effect functions, ...
    • Randomized Experiments in Education, with Implications for Multilevel Causal Inference

      Stephen W. Raudenbush1 and Daniel Schwartz21Department of Sociology, Harris School of Public Policy, and Committee on Education, University of Chicago, Chicago, Illinois 60637, USA; email: [email protected]2Department of Public Health Sciences, University of Chicago, Chicago, Illinois 60637, USA
      Annual Review of Statistics and Its Application Vol. 7: 177 - 208
      • ... or by statistical methods for variable selection and high-dimensional data (Green & Kern 2012, Guo et al. 2017, Imai & Ratkovic 2013, Wager & Athey 2018)....
    • Machine Learning Methods That Economists Should Know About

      Susan Athey1,2,3 and Guido W. Imbens1,2,3,41Graduate School of Business, Stanford University, Stanford, California 94305, USA; email: [email protected], [email protected]2Stanford Institute for Economic Policy Research, Stanford University, Stanford, California 94305, USA3National Bureau of Economic Research, Cambridge, Massachusetts 02138, USA4Department of Economics, Stanford University, Stanford, California 94305, USA
      Annual Review of Economics Vol. 11: 685 - 725
      • ... and applied to causal inference by Hill (2011) and Green & Kern (2012)....
    • Data ex Machina: Introduction to Big Data

      David Lazer1,2 and Jason Radford1,31Department of Political Science and College of Computer and Information Science, Northeastern University, Boston, Massachusetts 02115; email: [email protected], [email protected]2Institute for Quantitative Social Science, Harvard University, Cambridge, Massachusetts 021383Department of Sociology, University of Chicago, Chicago, Illinois 60637
      Annual Review of Sociology Vol. 43: 19 - 39
      • ...Others are developing algorithms that seek to inductively identify groups for whom effects are especially large (Athey & Imbens 2016, Green & Kern 2012, Imai & Ratkovic 2013, Taddy et al. 2016)....

  • 27. 
    Greenland S, Poole C. 2013. Living with p values: resurrecting a Bayesian perspective on frequentist statistics. Epidemiology 24(1):62–68
    • Crossref
    • Medline
    • Web of Science ®
    • Google Scholar
    Article Location
    More AR articles citing this reference

    • On p-Values and Bayes Factors

      Leonhard Held and Manuela OttEpidemiology, Biostatistics and Prevention Institute, University of Zurich, CH-8001 Zurich, Switzerland; email: [email protected], [email protected]
      Annual Review of Statistics and Its Application Vol. 5: 393 - 419
      • .... p-Values for point null hypotheses still dominate most of the applied literature (Greenland & Poole 2013), ...

  • 28. 
    Hastie T, Tibshirani R, Friedman JH. 2009. The Elements of Statistical Learning: Data Mining, Inference, and Prediction. New York: Springer. 2nd ed.
    • Crossref
    • Google Scholar
    Article Location
    More AR articles citing this reference

    • Small Steps with Big Data: Using Machine Learning in Energy and Environmental Economics

      Matthew C. Harding1 and Carlos Lamarche21Department of Economics and Department of Statistics, University of California, Irvine, California 92697; email: [email protected]2Department of Economics, Gatton College of Business and Economics, University of Kentucky, Lexington, Kentucky 40506
      Annual Review of Resource Economics Vol. 13: 469 - 488
      • ...cross-validation) for the selection of the tuning parameter λ (see, e.g., Hastie et al. 2009)....
    • Emerging Applications of Machine Learning in Food Safety

      Xiangyu Deng,1 Shuhao Cao,2 and Abigail L. Horn31Center for Food Safety, University of Georgia, Griffin, Georgia 30223, USA; email: [email protected]2Department of Mathematics and Statistics, Washington University, St. Louis, Missouri 63105, USA; email: [email protected]3Department of Preventive Medicine, University of Southern California, Los Angeles, California 90032, USA; email: [email protected]
      Annual Review of Food Science and Technology Vol. 12: 513 - 538
      • ...although at the cost of losing unbiasedness (Hastie et al. 2009) and parameter-prediction mechanics that are comprehensible by humans....
    • Artificial Intelligence, Predictive Policing, and Risk Assessment for Law Enforcement

      Richard A. BerkDepartments of Statistics and Criminology, University of Pennsylvania, Philadelphia, Pennsylvania 19104, USA; email: [email protected]
      Annual Review of Criminology Vol. 4: 209 - 237
      • ...and Xj is the jth predictor from a set of P predictors (Hastie et al. 2009)....
      • ...Figure 4 A simple deep-learning neural network with one hidden layer composed of M nodes [adapted from Berk (2020b) and Hastie et al. (2009)]....
      • ...ML methods are not constrained in this manner (Hastie et al. 2009)....
      • ...Another approach displays how each predictor is related to the response holding all other predictors constant without the use of the standard covariance adjustments employed by linear regression (Hastie et al. 2009)....
    • A Survey of Tuning Parameter Selection for High-Dimensional Regression

      Yunan Wu and Lan WangSchool of Statistics, University of Minnesota, Minneapolis, Minnesota 55455, USA; email: [email protected]
      Annual Review of Statistics and Its Application Vol. 7: 209 - 226
      • ...Statistical methods for analyzing high-dimensional data have been the focus of an enormous amount of research in the past decade or so; readers are directed to the books of Hastie et al. (2009), Bühlmann & Van de Geer (2011), Hastie et al. (2015), ...
    • Big Data in Industrial-Organizational Psychology and Human Resource Management: Forward Progress for Organizational Research and Practice

      Frederick L. Oswald,1 Tara S. Behrend,2 Dan J. Putka,3 and Evan Sinar41Department of Psychological Sciences, Rice University, Houston, Texas 77005, USA; email: [email protected]2Department of Organizational Sciences and Communication, George Washington University, Washington, DC 20052, USA3Human Resources Research Organization, Alexandria, Virginia 22314, USA4BetterUp, Pittsburgh, Pennsylvania 15243, USA
      Annual Review of Organizational Psychology and Organizational Behavior Vol. 7: 505 - 533
      • ...principal components regression; Hastie et al. 2009) or the composition and number of those components (e.g., ...
      • ...which provides a useful blend of conceptual and theoretical description with practical implementation; and Hastie et al. (2009), ...
    • Machine Learning for Fluid Mechanics

      Steven L. Brunton,1 Bernd R. Noack,2,3 and Petros Koumoutsakos41Department of Mechanical Engineering, University of Washington, Seattle, Washington 98195, USA2LIMSI (Laboratoire d'Informatique pour la Mécanique et les Sciences de l'Ingénieur), CNRS UPR 3251, Université Paris-Saclay, F-91403 Orsay, France3Institut für Strömungsmechanik und Technische Akustik, Technische Universität Berlin, D-10634 Berlin, Germany4Computational Science and Engineering Laboratory, ETH Zurich, CH-8092 Zurich, Switzerland; email: [email protected]
      Annual Review of Fluid Mechanics Vol. 52: 477 - 508
      • ...A commonly employed loss function is 2. Alternative loss functions may reflect different constraints on the LM such as sparsity (Hastie et al. 2009, Brunton & Kutz 2019)....
    • Machine Learning Methods That Economists Should Know About

      Susan Athey1,2,3 and Guido W. Imbens1,2,3,41Graduate School of Business, Stanford University, Stanford, California 94305, USA; email: [email protected], [email protected]2Stanford Institute for Economic Policy Research, Stanford University, Stanford, California 94305, USA3National Bureau of Economic Research, Cambridge, Massachusetts 02138, USA4Department of Economics, Stanford University, Stanford, California 94305, USA
      Annual Review of Economics Vol. 11: 685 - 725
      • ...and many textbooks discuss ML methods alongside more traditional statistical methods (e.g., Hastie et al. 2009, Efron & Hastie 2016)....
      • ...including the work of Efron & Hastie (2016); Hastie et al. (2009), ...
      • ...Hastie et al. (2009, 2015) discuss what they call the sparsity principle: ...
    • Machine Learning for Sociology

      Mario Molina and Filiz GaripDepartment of Sociology, Cornell University, Ithaca, New York 14853, USA; email: [email protected], [email protected]
      Annual Review of Sociology Vol. 45: 27 - 45
      • ...This error comprises two components: bias and variance (Hastie et al. 2009)....
      • ...estimating its generalization (prediction) error on new data (Hastie et al. 2009)....
      • ...and a quarter each for validation and testing (Hastie et al. 2009)....
      • ...Boosting involves giving more weight to misclassified observations over repeated estimation (Hastie et al. 2009)....
      • ...4Hastie et al. (2009, table 10.1) compare different methods on several criteria (e.g., ...
    • Precision Medicine

      Michael R. Kosorok1 and Eric B. Laber2 1Department of Biostatistics and Department of Statistics and Operations Research, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina 27599, USA; email: [email protected]2Department of Statistics, North Carolina State University, Raleigh, North Carolina 27695, USA; email: [email protected]
      Annual Review of Statistics and Its Application Vol. 6: 263 - 286
      • ... uses this framework with support vector machines (see, e.g., chapter 12 of Hastie et al. 2009)...
    • Forecasting Methods in Finance

      Allan TimmermannRady School of Management, University of California, San Diego, La Jolla, California 92093, USA; email: [email protected]
      Annual Review of Financial Economics Vol. 10: 449 - 479
      • ...have been developed in recent years (for an excellent introduction, see Hastie, Tibshirani & Friedman 2009)....
    • Big Data Approaches for Modeling Response and Resistance to Cancer Drugs

      Peng Jiang,1 William R. Sellers,2 and X. Shirley Liu11Dana–Farber Cancer Institute and Harvard T.H. Chan School of Public Health, Boston, Massachusetts 02215, USA; email: [email protected]2Broad Institute of MIT and Harvard, Cambridge, Massachusetts 02142, USA; email: [email protected]
      Annual Review of Biomedical Data Science Vol. 1: 1 - 27
      • ...These penalties help find coefficients of the optimal solution in high-dimensional settings while preventing the regression procedure from overfitting the training data (86)....
    • Machine Learning Approaches for Clinical Psychology and Psychiatry

      Dominic B. Dwyer, Peter Falkai, and Nikolaos KoutsoulerisDepartment of Psychiatry and Psychotherapy, Section for Neurodiagnostic Applications, Ludwig-Maximilian University, Munich 80638, Germany; email: [email protected], [email protected], [email protected]
      Annual Review of Clinical Psychology Vol. 14: 91 - 118
      • ...there are several well-established and highly regarded comprehensive methodological guides to machine learning that include formal statistical nomenclature (Bishop 2006, Hastie et al. 2009, James et al. 2015)....
      • ...and the long computational time (Hastie et al. 2009, Varoquaux et al. 2017)....
      • ...This process is then repeated for a prespecified number of k folds and results in more stable estimates of generalizability outside the sample because the training groups are more variable and there are more individuals in the left-out test sets (Hastie et al. 2009)....
      • ...and while authors recommend 5- or 10-fold CV (Breiman & Spector 1992) or statistical criteria (Hastie et al. 2009, James et al. 2015), ...
      • ...Aspects of the SVM algorithm have developed over time to include the ability to characterize nonlinear hyperplanes by using a data transformation implemented by a kernel function (Hastie et al. 2009, James et al. 2015)....
    • Treatment Selection in Depression

      Zachary D. Cohen and Robert J. DeRubeisDepartment of Psychology, University of Pennsylvania, Philadelphia, Pennsylvania 19104, USA; email: [email protected]
      Annual Review of Clinical Psychology Vol. 14: 209 - 236
      • ...researchers must weigh the increased flexibility and predictive power of such approaches against the interpretability (Hastie et al. 2009, James et al. 2013)...
      • ...data from the to-be-predicted patient cannot be included in the course of development of the algorithm (Hastie et al. 2009)....
    • Forecasting in Economics and Finance

      Graham Elliott1 and Allan Timmermann1,21Department of Economics, University of California, San Diego, La Jolla, California 92093; email: [email protected]2Center for Research in Econometric Analysis of Time Series, Aarhus University, DK-8210 Aarhus, Denmark
      Annual Review of Economics Vol. 8: 81 - 110
      • ...these methods will receive further consideration in future work. Hastie et al. (2009) provide a terrific introduction to statistical learning methods, ...
    • League Tables for Hospital Comparisons

      Sharon-Lise T. Normand,1 Arlene S. Ash,2 Stephen E. Fienberg,3 Thérèse A. Stukel,4 Jessica Utts,5 and Thomas A. Louis61Department of Health Care Policy, Harvard Medical School, and Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, Massachusetts 02115; email: [email protected]2Department of Quantitative Health Sciences, University of Massachusetts Medical School, Worcester, Massachusetts 01605; email: [email protected]3Department of Statistics, Carnegie Mellon University, Pittsburgh, Pennsylvania 15213; email: [email protected]4Institute for Clinical Evaluative Sciences, Toronto, Ontario M4N 3M5, Canada, and the Institute of Health Policy, Management & Evaluation, University of Toronto, Toronto, Ontario M5T 3M6, Canada, and Dartmouth Institute for Health Policy and Clinical Practice, Hanover, New Hampshire 03766; email: [email protected]5Department of Statistics, University of California, Irvine, California 92697; email: [email protected]6Department of Biostatistics, Johns Hopkins Bloomberg School of Public Health, Baltimore, Maryland 21205; email: [email protected]
      Annual Review of Statistics and Its Application Vol. 3: 21 - 50
      • ...and boosting (Berk 2008, Breiman 2001, Hastie et al. 2009, McCaffrey et al. 2004), ...
    • Far Right Parties in Europe

      Matt GolderDepartment of Political Science, Pennsylvania State University, University Park, Pennsylvania 16802; email: [email protected]
      Annual Review of Political Science Vol. 19: 477 - 497
      • ...They have yet to exploit recent developments in data mining, particularly with respect to cluster analysis (Hastie et al. 2009)....
    • Computerized Adaptive Diagnosis and Testing of Mental Health Disorders

      Robert D. Gibbons,1 David J. Weiss,2 Ellen Frank,3 and David Kupfer31Center for Health Statistics and Departments of Medicine and Public Health Sciences, University of Chicago, Chicago, Illinois 60612; email: [email protected]2Department of Psychology, University of Minnesota, Minneapolis, Minnesota 554553Department of Psychiatry and Western Psychiatric Institute and Clinic, University of Pittsburgh, Pittsburgh, Pennsylvania 15213
      Annual Review of Clinical Psychology Vol. 12: 83 - 104
      • ...have received considerable attention in statistics and machine learning (Brieman 1996, Hastie et al. 2009)....
      • ...decision trees have frequently suffered from poor performance (Hastie et al. 2009) because algorithms used to build trees from data can exhibit sensitivity to small changes in the data sets that are provided....
      • ...Random forests require minimal human intervention and have historically exhibited good performance across a wide range of domains (Brieman 2001, Hastie et al. 2009)....
    • Modular Brain Networks

      Olaf Sporns1,2 and Richard F. Betzel11Department of Psychological and Brain Sciences, Indiana University, Bloomington, Indiana 47405; email: [email protected]2Indiana University Network Science Institute, Indiana University, Bloomington, Indiana 47405
      Annual Review of Psychology Vol. 67: 613 - 640
      • ...Distance-based modules.One of the simplest methods for detecting modules in complex networks is to extend distance-based clustering techniques to be compatible with network data (Hastie et al. 2009)....
    • Analytics of Insurance Markets

      Edward W. FreesWisconsin School of Business, University of Wisconsin–Madison, Madison, Wisconsin 53706; email: [email protected]
      Annual Review of Financial Economics Vol. 7: 253 - 277
      • ...predictive analytics means advanced data-mining tools such as described in The Elements of Statistical Learning: Data Mining, Inference, and Prediction (Hastie, Tibshirani & Friedman 2009)....
    • Empirical Comparative Law

      Holger SpamannHarvard Law School, Harvard University, Cambridge, Massachusetts 02138; email: [email protected]
      Annual Review of Law and Social Science Vol. 11: 131 - 153
      • ...25This is the “bet on sparsity” principle coined by Hastie et al. (2009, ...
    • Valid Post-Selection and Post-Regularization Inference: An Elementary, General Approach

      Victor Chernozhukov,1 Christian Hansen,2 and Martin Spindler3 1Department of Economics, Massachusetts Institute of Technology, Cambridge, Massachusetts 02142; email: [email protected] 2University of Chicago Booth School of Business, Chicago, Illinois 60637; email: [email protected] 3Munich Center for the Economics of Aging, 80799 Munich, Germany; email: [email protected]
      Annual Review of Economics Vol. 7: 649 - 688
      • ...Many other interesting procedures beyond those mentioned in this review have been developed for estimating high-dimensional models (see, e.g., Hastie et al. 2009 for a textbook review)....
    • Statistical Foundations for Model-Based Adjustments

      Sander Greenland1 and Neil Pearce2,31Department of Epidemiology and Department of Statistics, University of California, Los Angeles, California 90095-1772; email: [email protected]2Departments of Medical Statistics and Non-Communicable Disease Epidemiology, Faculty of Epidemiology and Population Health, London School of Hygiene and Tropical Medicine, London WC1E 7HT, United Kingdom; email: [email protected]3Centre for Public Health Research, Massey University, Wellington 6140, New Zealand
      Annual Review of Public Health Vol. 36: 89 - 108
      • ...We focus entirely on methods for observational studies of causal effects, there being some excellent texts on purely predictive modeling (79, 80, 94, 122, 132)....
      • ...but we must leave many difficult issues about model specification and diagnostics to more detailed discussions (3, 60, 61, 67, 79, 80, 94, 122, 132, 140)....
      • ... [even though all modern software allows use of better criteria (60, 79, 80, 132)]....
      • ...the resulting model often yields much poorer predictions than can be obtained with modern techniques (79, 80, 132)....
      • ...Defects can be corrected by using advanced resampling and cross-validation methods (80, 122, 132, 140), ...
      • ...such as the Akaike information criterion (AIC) or Bayesian information criterion (BIC) (79, 80, 132); nonetheless, ...
      • ...Again, there are ways to account for this problem (79, 80, 132, 140), but we know of none that are easy to implement with popular software....
      • ...such as cross-validation or bootstrapping, to a level that rivals much more sophisticated algorithms (80)....
      • ... and can be applied similarly to outcome modeling (80) as long as exposure is forced into the model....
    • High-Dimensional Statistics with a View Toward Applications in Biology

      Peter Bühlmann, Markus Kalisch, and Lukas MeierSeminar for Statistics, ETH Zürich, CH-8092 Zürich, Switzerland; email: [email protected], [email protected], [email protected]
      Annual Review of Statistics and Its Application Vol. 1: 255 - 278
      • ...Assessing the accuracy of prediction is relatively straightforward using the tool of cross-validation (cf. Hastie et al. 2009)....
    • Sensors and Decoding for Intracortical Brain Computer Interfaces

      Mark L. Homer,1 Arto V. Nurmikko,2 John P. Donoghue,4,3 and Leigh R. Hochberg4,2,51Biomedical Engineering,2School of Engineering,3Department of Neuroscience, Brown University, Providence, Rhode Island 02912; email: [email protected]4Center for Neurorestoration and Neurotechnology, Veterans Affairs Medical Center, Providence, Rhode Island 029085Neurology, Massachusetts General Hospital, Harvard Medical School, Boston, Massachusetts 02114
      Annual Review of Biomedical Engineering Vol. 15: 383 - 405
      • ...akin to forward stepwise regression, can find promising, though typically not optimal, subsets (83)....
    • Sparse High-Dimensional Models in Economics

      Jianqing Fan,1,2 Jinchi Lv,3 and Lei Qi1,21Bendheim Center for Finance and 2Department of Operations Research and Financial Engineering, Princeton University, Princeton, New Jersey 08544; email: [email protected], [email protected]3Information and Operations Management Department, Marshall School of Business, University of Southern California, Los Angeles, California 90089; email: [email protected]
      Annual Review of Economics Vol. 3: 291 - 317
      • ...such as text and document classification and computer vision (see Hastie et al. 2009 for more examples)....
    • Species Distribution Models: Ecological Explanation and Prediction Across Space and Time

      Jane Elith1 and John R. Leathwick21School of Botany, The University of Melbourne, Victoria 3010, Australia; email: [email protected]2National Institute of Water and Atmospheric Research, Hamilton, New Zealand; email: [email protected]
      Annual Review of Ecology, Evolution, and Systematics Vol. 40: 677 - 697
      • ...both within the model-fitting process, and for model evaluation (Hastie et al. 2009)....
      • ...Other model averaging techniques from computer science use a range of approaches to concurrently develop a set of models that together predict well (Hastie et al. 2009)....
      • ...The different information criteria provide a range of trade-offs between model complexity and predictive performance and can be used within cross-validation to select a model (Hastie et al. 2009)....

  • 29. 
    Hua J, Xiong Z, Lowey J, Suh E, Dougherty ER. 2005. Optimal number of features as a function of sample size for various classification rules. Bioinformatics 21(8):1509–15
    • Crossref
    • Medline
    • Web of Science ®
    • Google Scholar
    Article Location
  • 30. 
    Imai K, Ratkovic M. 2013. Estimating treatment effect heterogeneity in randomized program evaluation. Ann. Appl. Stat. 7(1):443–70
    • Crossref
    • Web of Science ®
    • Google Scholar
    Article Location
    More AR articles citing this reference

    • Machine Learning for Social Science: An Agnostic Approach

      Justin Grimmer,1 Margaret E. Roberts,2 and Brandon M. Stewart31Department of Political Science and Hoover Institution, Stanford University, Stanford, California 94305, USA; email: [email protected]2Department of Political Science and Halıcıoğlu Data Science Institute, University of California San Diego, La Jolla, California 92093, USA; email: [email protected]3Department of Sociology and Office of Population Research, Princeton University, Princeton, New Jersey 08540, USA; email: [email protected]
      Annual Review of Political Science Vol. 24: 395 - 419
      • ...a growing literature uses machine learning methods to estimate heterogeneous treatment effects: how the effect of a particular intervention differs across characteristics of individuals (Imai & Ratkovic 2013, Athey & Imbens 2016, Grimmer et al. 2017, Wager & Athey 2018, Künzel et al. 2019)....
    • External Validity

      Michael G. Findley,1 Kyosuke Kikuta,2 and Michael Denly11Department of Government, University of Texas, Austin, Texas 78712, USA; email: mikefindle[email protected]2Osaka School of International Public Policy, Osaka University, Osaka 560-0043, Japan
      Annual Review of Political Science Vol. 24: 365 - 393
      • ...making it sometimes necessary to devise a correction or weighting method for external validity inferences (Cole & Stuart 2010; Imai & Ratkovic 2013...
    • Randomized Experiments in Education, with Implications for Multilevel Causal Inference

      Stephen W. Raudenbush1 and Daniel Schwartz21Department of Sociology, Harris School of Public Policy, and Committee on Education, University of Chicago, Chicago, Illinois 60637, USA; email: [email protected]2Department of Public Health Sciences, University of Chicago, Chicago, Illinois 60637, USA
      Annual Review of Statistics and Its Application Vol. 7: 177 - 208
      • ... or by statistical methods for variable selection and high-dimensional data (Green & Kern 2012, Guo et al. 2017, Imai & Ratkovic 2013, Wager & Athey 2018)....
    • Machine Learning Methods That Economists Should Know About

      Susan Athey1,2,3 and Guido W. Imbens1,2,3,41Graduate School of Business, Stanford University, Stanford, California 94305, USA; email: [email protected], [email protected]2Stanford Institute for Economic Policy Research, Stanford University, Stanford, California 94305, USA3National Bureau of Economic Research, Cambridge, Massachusetts 02138, USA4Department of Economics, Stanford University, Stanford, California 94305, USA
      Annual Review of Economics Vol. 11: 685 - 725
      • ...Targeted maximum likelihood (van der Laan & Rubin 2006) is one approach to this, while Imai & Ratkovic (2013)...
    • Machine Learning for Sociology

      Mario Molina and Filiz GaripDepartment of Sociology, Cornell University, Ithaca, New York 14853, USA; email: [email protected], [email protected]
      Annual Review of Sociology Vol. 45: 27 - 45
      • ...Imai & Ratkovic (2013) discover groups of workers differentially affected by a job training program....
    • Better Government, Better Science: The Promise of and Challenges Facing the Evidence-Informed Policy Movement

      Jake Bowers1 and Paul F. Testa21Department of Political Science, University of Illinois at Urbana-Champaign, Urbana, Illinois 61801, USA; email: [email protected]2Department of Political Science, Brown University, Providence, Rhode Island 02912, USA; email: [email protected]
      Annual Review of Political Science Vol. 22: 521 - 542
      • ...—with applications from the field of machine learning to let the data help identify heterogeneous treatment effects (Imai & Ratkovic 2013, Wager & Athey 2017)....
    • Treatment Selection in Depression

      Zachary D. Cohen and Robert J. DeRubeisDepartment of Psychology, University of Pennsylvania, Philadelphia, Pennsylvania 19104, USA; email: [email protected]
      Annual Review of Clinical Psychology Vol. 14: 209 - 236
      • ...and not to potentially important sources of variability in treatment response (Imai & Ratkovic 2013, Kessler et al. 2017)....
    • Data ex Machina: Introduction to Big Data

      David Lazer1,2 and Jason Radford1,31Department of Political Science and College of Computer and Information Science, Northeastern University, Boston, Massachusetts 02115; email: [email protected], [email protected]2Institute for Quantitative Social Science, Harvard University, Cambridge, Massachusetts 021383Department of Sociology, University of Chicago, Chicago, Illinois 60637
      Annual Review of Sociology Vol. 43: 19 - 39
      • ...Others are developing algorithms that seek to inductively identify groups for whom effects are especially large (Athey & Imbens 2016, Green & Kern 2012, Imai & Ratkovic 2013, Taddy et al. 2016)....
    • Field Experimentation and the Study of Law and Policy

      Donald P. Green1 and Dane R. Thorley1,21Department of Political Science,2Columbia Law School, Columbia University, New York, NY 10027; email: [email protected], [email protected]
      Annual Review of Law and Social Science Vol. 10: 53 - 72
      • ... and machine learning (Chipman et al. 2010, Imai & Ratkovic 2013, van der Laan & Rose 2011) that facilitate experimental design and data analysis....
    • Dynamic Treatment Regimes

      Bibhas Chakraborty1 and Susan A. Murphy21Duke-NUS Graduate Medical School, National University of Singapore, Singapore 169857; email: [email protected]2Department of Statistics and Institute for Social Research, University of Michigan, Ann Arbor, Michigan 48109; email: [email protected]
      Annual Review of Statistics and Its Application Vol. 1: 447 - 464
      • ...Other work in the one-stage decision setting includes Cai et al. (2011) and Imai & Ratkovicz (2013)....

  • 31. 
    Jaderberg M, Dalibard V, Osindero S, Czarnecki WM, Donahue J, et al. 2017. Population based training of neural networks. arXiv:1711.09846 [Cs]
    • Google Scholar
    Article Location
  • 32. 
    Jensen PB, Jensen LJ, Brunak S. 2012. Mining electronic health records: towards better research applications and clinical care. Nat. Rev. Genet. 13(6):395–405
    • Crossref
    • Medline
    • Web of Science ®
    • Google Scholar
    Article Location
    More AR articles citing this reference

    • The Digitization of Patient Care: A Review of the Effects of Electronic Health Records on Health Care Quality and Utilization

      Hilal Atasoy,1 Brad N. Greenwood,2 and Jeffrey Scott McCullough31Department of Accounting, Temple University, Philadelphia, Pennsylvania 19122, USA; email: [email protected]2Carlson School of Management, University of Minnesota, Minneapolis, Minnesota 55455, USA; email: [email protected]3Department of Health Management and Policy, University of Michigan, Ann Arbor, Michigan 48109-2029, USA; email: [email protected]
      Annual Review of Public Health Vol. 40: 487 - 500
      • ...Emerging opportunities in unsupervised machine learning may yield seismic shifts in how knowledge is generated and discovered (54)....
    • Using Electronic Health Records for Population Health Research: A Review of Methods and Applications

      Joan A. Casey,1 Brian S. Schwartz,2,3 Walter F. Stewart,4 and Nancy E. Adler51Robert Wood Johnson Foundation Health and Society Scholars Program at the University of California, San Francisco, and the University of California, Berkeley, Berkeley, California 94720-7360; [email protected]2Departments of Environmental Health Sciences and Epidemiology, Bloomberg School of Public Health, and the Department of Medicine, School of Medicine, Johns Hopkins University, Baltimore, Maryland 21205; email: [email protected]3Center for Health Research, Geisinger Health System, Danville, Pennsylvania 178224Research, Development and Dissemination, Sutter Health, Walnut Creek, California 94596; email: [email protected]5Center for Health and Community and the Department of Psychiatry, University of California, San Francisco, California 94118; email: [email protected]
      Annual Review of Public Health Vol. 37: 61 - 81
      • ...high-volume EHR data to guide decision making for individual patients (56)....
      • ...These could include improved capture of social/behavioral (2), environmental, and genetic data (56)...
    • Omics and Drug Response

      Urs A. Meyer,1 Ulrich M. Zanger,2 and Matthias Schwab2,31Division of Pharmacology and Neurobiology, Biozentrum of the University of Basel, CH-4056 Basel, Switzerland; email: [email protected]2Dr. Margarete Fischer-Bosch Institute of Clinical Pharmacology, D-70376 Stuttgart, Germany; email: [email protected], [email protected]3Department of Clinical Pharmacology, University Hospital Tübingen, D-72076 Tübingen, Germany
      Annual Review of Pharmacology and Toxicology Vol. 53: 475 - 502
      • ...EMRs can enable the comparison of treatments and outcomes in large cohorts of patients in real-world settings (135)....
      • ...We conclude that EMRs linked to omics data and biobanks are becoming an important resource in the study of drug responses (135, 140)....
      • ...EHRs linked to repositories of biological samples of large patient cohorts will be an important tool to produce the data needed to conceive prospective biomarker-guided studies regarding clinical relevance and clinical utility (135) (Figure 3). ...

  • 33. 
    Kanter JM, Veeramachaneni K. 2015. Deep feature synthesis: towards automating data science endeavors. In 2015 IEEE International Conference on Data Science and Advanced Analytics (DSAA), Paris, pp. 1–10. New York: IEEE
    • Crossref
    • Google Scholar
    Article Location
  • 34. 
    Keefe JR, Wiltsey Stirman S, Cohen ZD, DeRubeis RJ, Smith BN, Resick PA. 2018. In rape trauma PTSD, patient characteristics indicate which trauma-focused treatment they are most likely to complete. Depress. Anxiety 35(4):330–38
    • Crossref
    • Medline
    • Web of Science ®
    • Google Scholar
    Article Location
  • 35. 
    Kind AJH, Jencks S, Brock J, Yu M, Bartels C, et al. 2014. Neighborhood socioeconomic disadvantage and 30-day rehospitalization: a retrospective cohort study. Ann. Intern. Med. 161(11):765–74
    • Crossref
    • Medline
    • Web of Science ®
    • Google Scholar
    Article Location
  • 36. 
    Künzel SR, Sekhon JS, Bickel PJ, Yu B. 2019. Metalearners for estimating heterogeneous treatment effects using machine learning. PNAS 116:4156–65
    • Crossref
    • Medline
    • Web of Science ®
    • Google Scholar
    Article Location
    More AR articles citing this reference

    • Machine Learning for Social Science: An Agnostic Approach

      Justin Grimmer,1 Margaret E. Roberts,2 and Brandon M. Stewart31Department of Political Science and Hoover Institution, Stanford University, Stanford, California 94305, USA; email: [email protected]2Department of Political Science and Halıcıoğlu Data Science Institute, University of California San Diego, La Jolla, California 92093, USA; email: [email protected]3Department of Sociology and Office of Population Research, Princeton University, Princeton, New Jersey 08540, USA; email: [email protected]
      Annual Review of Political Science Vol. 24: 395 - 419
      • ...a growing literature uses machine learning methods to estimate heterogeneous treatment effects: how the effect of a particular intervention differs across characteristics of individuals (Imai & Ratkovic 2013, Athey & Imbens 2016, Grimmer et al. 2017, Wager & Athey 2018, Künzel et al. 2019)....

  • 37. 
    Maier-Hein L, Eisenmann M, Reinke A, Onogur S, Stankovic M, et al. 2018. Why rankings of biomedical image analysis competitions should be interpreted with care. Nat. Commun. 9(1):5217
    • Crossref
    • Medline
    • Web of Science ®
    • Google Scholar
    Article Location
    More AR articles citing this reference

    • Quantitative Molecular Positron Emission Tomography Imaging Using Advanced Deep Learning Techniques

      Habib Zaidi1,2,3,4 and Issam El Naqa5,6,71Division of Nuclear Medicine and Molecular Imaging, Geneva University Hospital, 1211 Geneva, Switzerland; email: [email protected]2Geneva Neuroscience Centre, University of Geneva, 1205 Geneva, Switzerland3Department of Nuclear Medicine and Molecular Imaging, University of Groningen, 9700 RB Groningen, Netherlands4Department of Nuclear Medicine, University of Southern Denmark, DK-5000 Odense, Denmark5Department of Machine Learning, Moffitt Cancer Center, Tampa, Florida 33612, USA6Department of Radiation Oncology, University of Michigan, Ann Arbor, Michigan 48109, USA7Department of Oncology, McGill University, Montreal, Quebec H3A 1G5, Canada
      Annual Review of Biomedical Engineering Vol. 23: 249 - 276
      • ...Criticisms raised in a detailed report evaluating 150 biomedical image analysis challenges performed prior to the end of 2016 include irregularities in quality, assessment, reproducibility, and ranking (133), ...

  • 38. 
    Messier KP, Wheeler DC, Flory AR, Jones RR, Patel D, et al. 2019. Modeling groundwater nitrate exposure in private wells of North Carolina for the Agricultural Health Study. Sci. Total Environ. 655:512–19
    • Crossref
    • Medline
    • Web of Science ®
    • Google Scholar
    Article Location
  • 39. 
    Mocanu DC, Mocanu E, Stone P, Nguyen PH, Gibescu M, Liotta A. 2018. Scalable training of artificial neural networks with adaptive sparse connectivity inspired by network science. Nat. Commun. 9(1):2383
    • Crossref
    • Medline
    • Web of Science ®
    • Google Scholar
    Article Location
  • 40. 
    Motsinger-Reif AA, Dudek SM, Hahn LW, Ritchie MD. 2008. Comparison of approaches for machine-learning optimization of neural networks for detecting gene-gene interactions in genetic epidemiology. Genet. Epidemiol. 32(4):325–40
    • Crossref
    • Medline
    • Web of Science ®
    • Google Scholar
    Article Location
  • 41. 
    Nat. Commun. Editors. 2018. Epidemiology is a science of high importance. Nat. Commun. 9(1):1703
    • Crossref
    • Medline
    • Web of Science ®
    • Google Scholar
    Article Location
  • 42. 
    Penone C, Davidson AD, Shoemaker KT, Di Marco M, Rondinini C, et al. 2014. Imputation of missing data in life-history trait datasets: Which approach performs the best? Methods Ecol. Evol. 5(9):961–70
    • Crossref
    • Web of Science ®
    • Google Scholar
    Article Location
  • 43. 
    Pereira S, Meier R, McKinley R, Wiest R, Alves V, et al. 2018. Enhancing interpretability of automatically extracted machine learning features: application to a RBM-random forest system on brain lesion segmentation. Med. Image Anal. 44:228–44
    • Crossref
    • Medline
    • Web of Science ®
    • Google Scholar
    Article Location
  • 44. 
    Powers S, Qian J, Jung K, Schuler A, Shah NH, et al. 2018. Some methods for heterogeneous treatment effect estimation in high dimensions. Stat. Med. 37(11):1767–87
    • Crossref
    • Medline
    • Web of Science ®
    • Google Scholar
    Article Location
  • 45. 
    R. Soc. (G. B.). 2017. Machine learning: the power and promise of computers that learn by example. Rep. DES4702, R. Soc. G. B., London. https://royalsociety.org/-/media/policy/projects/machine-learning/publications/machine-learning-report.pdf?
    • Google Scholar
    Article Location
  • 46. 
    Ramaswami R, Bayer R, Galea S. 2018. Precision medicine from a public health perspective. Annu. Rev. Public Health 39:153–68
    • Link
    • Web of Science ®
    • Google Scholar
  • 47. 
    Raudys SJ, Jain AK. 1991. Small sample size effects in statistical pattern recognition: recommendations for practitioners. IEEE Trans. Pattern Anal. Mach. Intell. 13(3):252–64
    • Crossref
    • Web of Science ®
    • Google Scholar
    Article Location
  • 48. 
    Ribeiro MT, Singh S, Guestrin C. 2016. “Why should I trust you?”: explaining the predictions of any classifier. In KDD ’16 Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 1135–44. New York: ACM
    • Crossref
    • Google Scholar
    Article Location
    More AR articles citing this reference

    • Deep Learning in Biomedical Data Science

      Pierre BaldiDepartment of Computer Science, Institute for Genomics and Bioinformatics, and Center for Machine Learning and Intelligent Systems, University of California, Irvine, California 92697, USA; email: [email protected]
      Annual Review of Biomedical Data Science Vol. 1: 181 - 205
      • ... or by training a simple classifier using examples in the close neighborhood of a test example (158)....
    • Big Data in Public Health: Terminology, Machine Learning, and Privacy

      Stephen J. Mooney1 and Vikas Pejaver21Harborview Injury Prevention and Research Center, University of Washington, Seattle, Washington 98122, USA; email: [email protected]2Department of Biomedical Informatics and Medical Education and the eScience Institute, University of Washington, Seattle, Washington 98109, USA; email: [email protected]
      Annual Review of Public Health Vol. 39: 95 - 112
      • ...More sophisticated approaches such as Local Interpretable Model-agnostic Explanations (LIME) (105)...

  • 49. 
    Ronca E, Scheel-Sailer A, Koch HG, Gemperli A, Group SwiSCI Study, et al. 2017. Health care utilization in persons with spinal cord injury: part 2—determinants, geographic variation and comparison with the general population. Spinal Cord 55(9):828–33
    • Crossref
    • Medline
    • Web of Science ®
    • Google Scholar
    Article Location
  • 50. 
    Rosenblatt F. 1958. The perceptron: a probabilistic model for information storage and organization in the brain. Psychol. Rev. 65(6):386–408
    • Crossref
    • Medline
    • Web of Science ®
    • Google Scholar
    Article Location
    More AR articles citing this reference

    • Online Learning Algorithms

      Nicolò Cesa-Bianchi1 and Francesco Orabona21Department of Computer Science and Data Science Research Center, Università degli Studi di Milano, Milano 20133, Italy; email: [email protected]2Department of Electrical and Computer Engineering, Boston University, Boston, Massachusetts 02215, USA; email: [email protected]
      Annual Review of Statistics and Its Application Vol. 8: 165 - 190
      • ...The analysis of OGD can be easily adapted to derive a regret bound for the perceptron algorithm of Rosenblatt (1958) for binary classification, ...
    • Machine-Learning Quantum States in the NISQ Era

      Giacomo Torlai1 and Roger G. Melko2,31Center for Computational Quantum Physics, Flatiron Institute, New York, NY 10010, USA; email: [email protected]2Department of Physics and Astronomy, University of Waterloo, Waterloo, Ontario N2L 3G1, Canada; email: [email protected]3Perimeter Institute for Theoretical Physics, Waterloo, Ontario N2L 2Y5, Canada
      Annual Review of Condensed Matter Physics Vol. 11: 325 - 344
      • ...the perceptron, was proposed by Frank Rosenblatt as early as 1958 (10)....
    • Machine Learning for Fluid Mechanics

      Steven L. Brunton,1 Bernd R. Noack,2,3 and Petros Koumoutsakos41Department of Mechanical Engineering, University of Washington, Seattle, Washington 98195, USA2LIMSI (Laboratoire d'Informatique pour la Mécanique et les Sciences de l'Ingénieur), CNRS UPR 3251, Université Paris-Saclay, F-91403 Orsay, France3Institut für Strömungsmechanik und Technische Akustik, Technische Universität Berlin, D-10634 Berlin, Germany4Computational Science and Engineering Laboratory, ETH Zurich, CH-8092 Zurich, Switzerland; email: [email protected]
      Annual Review of Fluid Mechanics Vol. 52: 477 - 508
      • ...machines such as the perceptron (Rosenblatt 1958) aimed to automate processes such as classification and regression....
      • ...It is perhaps the oldest method for learning, starting with the perceptron (Rosenblatt 1958), ...
    • Computational Modeling of Phonological Learning

      Gaja JaroszDepartment of Linguistics, University of Massachusetts, Amherst, Massachusetts 01003-1100, USA; email: [email protected]
      Annual Review of Linguistics Vol. 5: 67 - 90
      • ... has been particularly influential and forms the basis of a number of the frequency-sensitive models discussed below. “Error-driven” (Gibson & Wexler 1994, Rosenblatt 1958, Wexler & Culicover 1980) means that updates to the grammar are triggered when the learner's own predicted output fails to match the observed output in the learning data....
    • Deep Learning in Biomedical Data Science

      Pierre BaldiDepartment of Computer Science, Institute for Genomics and Bioinformatics, and Center for Machine Learning and Intelligent Systems, University of California, Irvine, California 92697, USA; email: [email protected]
      Annual Review of Biomedical Data Science Vol. 1: 181 - 205
      • ...direct descendants of early efforts to model the brain and cognitions in the 1940s and 1950s (1, 2)....
    • Machine Learning Approaches for Clinical Psychology and Psychiatry

      Dominic B. Dwyer, Peter Falkai, and Nikolaos KoutsoulerisDepartment of Psychiatry and Psychotherapy, Section for Neurodiagnostic Applications, Ludwig-Maximilian University, Munich 80638, Germany; email: [email protected], [email protected], [email protected]
      Annual Review of Clinical Psychology Vol. 14: 91 - 118
      • ...The aim of the creators of the perceptron was to build a machine that could develop its own formulae to solve problems through experience (Rosenblatt 1958)....
    • Computational Neuroscience: Mathematical and Statistical Perspectives

      Robert E. Kass,1 Shun-Ichi Amari,2 Kensuke Arai,3 Emery N. Brown,4,5 Casey O. Diekman,6 Markus Diesmann,7,8 Brent Doiron,9 Uri T. Eden,3 Adrienne L. Fairhall,10 Grant M. Fiddyment,3 Tomoki Fukai,2 Sonja Grün,7,8 Matthew T. Harrison,11 Moritz Helias,7,8 Hiroyuki Nakahara,2 Jun-nosuke Teramae,12 Peter J. Thomas,13 Mark Reimers,14 Jordan Rodu,15 Horacio G. Rotstein,16,17 Eric Shea-Brown,10 Hideaki Shimazaki,18,19 Shigeru Shinomoto,19 Byron M. Yu,20 and Mark A. Kramer31Department of Statistics, Machine Learning Department, and Center for the Neural Basis of Cognition, Carnegie Mellon University, Pittsburgh, Pennsylvania 15213, USA; email: [email protected]2Mathematical Neuroscience Laboratory, RIKEN Brain Science Institute, Wako, Saitama Prefecture 351-0198, Japan3Department of Mathematics and Statistics, Boston University, Boston, Massachusetts 02215, USA4Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, USA5Department of Anesthesia, Harvard Medical School, Boston, Massachusetts 02115, USA6Department of Mathematical Sciences, New Jersey Institute of Technology, Newark, New Jersey 07102, USA7Institute of Neuroscience and Medicine, Jülich Research Centre, 52428 Jülich, Germany8Department of Theoretical Systems Neurobiology, Institute of Biology, RWTH Aachen University, 52062 Aachen, Germany9Department of Mathematics, University of Pittsburgh, Pittsburgh, Pennsylvania 15260, USA10Department of Physiology and Biophysics, University of Washington, Seattle, Washington 98105, USA11Division of Applied Mathematics, Brown University, Providence, Rhode Island 02912, USA12Department of Integrated Theoretical Neuroscience, Osaka University, Suita, Osaka Prefecture 565-0871, Japan13Department of Mathematics, Applied Mathematics, and Statistics, Case Western Reserve University, Cleveland, Ohio 44106, USA14Department of Neuroscience, Michigan State University, East Lansing, Michigan 48824, USA15Department of Statistics, University of Virginia, Charlottesville, Virginia 22904, USA16Federated Department of Biological Sciences, Rutgers University/New Jersey Institute of Technology, Newark, New Jersey 07102, USA17Institute for Brain and Neuroscience Research, New Jersey Institute of Technology, Newark, New Jersey 07102, USA18Honda Research Institute Japan, Wako, Saitama Prefecture 351-0188, Japan19Department of Physics, Kyoto University, Kyoto, Kyoto Prefecture 606-8502, Japan20Department of Electrical and Computer Engineering and Department of Biomedical Engineering, Carnegie Mellon University, Pittsburgh, Pennsylvania 15213, USA
      Annual Review of Statistics and Its Application Vol. 5: 183 - 214
      • ...A different kind of computational architecture, developed by Rosenblatt (1958), combined the McCulloch–Pitts conception with a learning rule based on ideas articulated by Hebb (Hebb 1949)...
    • Deep Neural Networks: A New Framework for Modeling Biological Vision and Brain Information Processing

      Nikolaus KriegeskorteMedical Research Council Cognition and Brain Sciences Unit, University of Cambridge, Cambridge CB2 7EF, United Kingdom; email: [email protected]
      Annual Review of Vision Science Vol. 1: 417 - 446
      • ...the nonlinearity was simply a step function (McCulloch & Pitts 1943, Rosenblatt 1958, Minsky & Papert 1972), ...
    • Compressed Sensing, Sparsity, and Dimensionality in Neuronal Information Processing and Data Analysis

      Surya Ganguli1 and Haim Sompolinsky2,31Department of Applied Physics, Stanford University, Stanford, California 94305; email: [email protected]2Edmond and Lily Safra Center for Brain Sciences, Interdisciplinary Center for Neural Computation, Hebrew University, Jerusalem 91904, Israel; email: [email protected]3Center for Brain Science, Harvard University, Cambridge, Massachusetts 02138
      Annual Review of Neuroscience Vol. 35: 485 - 508
      • ...Such a model is equivalent to the classical perceptron (Rosenblatt 1958)....
    • Multivariate Analysis Methods in Particle Physics

      Pushpalatha C. BhatFermi National Accelerator Laboratory, Batavia, Illinois 60510; email: [email protected]
      Annual Review of Nuclear and Particle Science Vol. 61: 281 - 309
      • ...particularly in Frank Rosenblatt's (31) creation of the Perceptron around 1960....

  • 51. 
    Sadilek A, Caty S, DiPrete L, Mansour R, Schenk T Jr., et al. 2018. Machine-learned epidemiology: real-time detection of foodborne illness at scale. npj Digit. Med. 1(1):36
    • Crossref
    • Medline
    • Web of Science ®
    • Google Scholar
    Article Location
    More AR articles citing this reference

    • Data Science in the Food Industry

      George-John Nychas,1 Emma Sims,2 Panagiotis Tsakanikas,1 and Fady Mohareb21Laboratory of Microbiology and Biotechnology of Foods, Department of Food Science and Human Nutrition, School of Food and Nutritional Sciences, Agricultural University of Athens, 11855 Athens, Greece; email: [email protected]2Bioinformatics Group, Department of Agrifood, School of Water, Energy and Environment, Cranfield University, Cranfield, Bedfordshire MK43 0AL, United Kingdom
      Annual Review of Biomedical Data Science Vol. 4: 341 - 367
      • ...food stakeholders using AI would be able to collaborate with public health agencies and research (academic or industry) partners toward the identification of contamination sources (139) so as to prevent foodborne outbreaks (140)....
    • Emerging Applications of Machine Learning in Food Safety

      Xiangyu Deng,1 Shuhao Cao,2 and Abigail L. Horn31Center for Food Safety, University of Georgia, Griffin, Georgia 30223, USA; email: [email protected]2Department of Mathematics and Statistics, Washington University, St. Louis, Missouri 63105, USA; email: [email protected]3Department of Preventive Medicine, University of Southern California, Los Angeles, California 90032, USA; email: [email protected]
      Annual Review of Food Science and Technology Vol. 12: 513 - 538
      • ...and query data such as Google search history (Sadilek et al. 2018)....
      • ...a team involving Google researchers and the Chicago and Las Vegas health departments combined aggregated Google search queries with smartphone location data to identify restaurants violating health codes (Sadilek et al. 2018)....
    • Social Media– and Internet-Based Disease Surveillance for Public Health

      Allison E. Aiello, Audrey Renson, and Paul N. ZivichDepartment of Epidemiology, Carolina Population Center, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina 27599-7435, USA; email: [email protected], [email protected], [email protected]
      Annual Review of Public Health Vol. 41: 101 - 118
      • ...Owing to the volume and types of data in digital sources, surveillance projects often utilize machine learning algorithms (1, 79, 83)....
      • ...digital data have also aided the identification of foodborne outbreak point sources through tweets (48), Google search and location logs (83), ...
      • ...Other examples of digital data use for surveillance abound, including mosquito-borne infectious diseases (1, 15, 19, 80, 107), foodborne infectious diseases (29, 47, 70, 83), ...

  • 52. 
    Saito T, Rehmsmeier M. 2015. The precision-recall plot is more informative than the ROC plot when evaluating binary classifiers on imbalanced datasets. PLOS ONE 10(3):0118432
    • Crossref
    • Web of Science ®
    • Google Scholar
    Article Location
  • 53. 
    Samuel AL. 1959. Some studies in machine learning using the game of checkers. IBM J. Res. Dev. 3(3):210–29
    • Crossref
    • Web of Science ®
    • Google Scholar
    Article Location
    More AR articles citing this reference

    • Emerging Applications of Machine Learning in Food Safety

      Xiangyu Deng,1 Shuhao Cao,2 and Abigail L. Horn31Center for Food Safety, University of Georgia, Griffin, Georgia 30223, USA; email: [email protected]2Department of Mathematics and Statistics, Washington University, St. Louis, Missouri 63105, USA; email: [email protected]3Department of Preventive Medicine, University of Southern California, Los Angeles, California 90032, USA; email: [email protected]
      Annual Review of Food Science and Technology Vol. 12: 513 - 538
      • ...Arthur Samuel (1959, p. 211) coined the term machine learning to demonstrate how a computer could acquire the ability to play and excel at the game of checkers in a way that, ...
    • Bringing Machine Learning and Compositional Semantics Together

      Percy Liang1 and Christopher Potts2Departments of 1Computer Science and2Linguistics, Stanford University, Stanford, California 94305; email: [email protected], [email protected]
      Annual Review of Linguistics Vol. 1: 355 - 376
      • ...and adjusts its future choices in a way that favors wins (Samuel 1959, 1967)....

  • 54. 
    Seligman B, Tuljapurkar S, Rehkopf D. 2018. Machine learning approaches to the social determinants of health in the health and retirement study. SSM - Popul. Health 4:95–99
    • Crossref
    • Medline
    • Web of Science ®
    • Google Scholar
    Article Location
  • 55. 
    Shah AD, Bartlett JW, Carpenter J, Nicholas O, Hemingway H. 2014. Comparison of random forest and parametric imputation models for imputing missing data using MICE: a CALIBER study. Am. J. Epidemiol. 179(6):764–74
    • Crossref
    • Medline
    • Web of Science ®
    • Google Scholar
    Article Location
  • 56. 
    Shickel B, Loftus TJ, Adhikari L, Ozrazgat-Baslanti T, Bihorac A, Rashidi P. 2019. DeepSOFA: a continuous acuity score for critically ill patients using clinically interpretable deep learning. Sci. Rep. 9(1):1879
    • Crossref
    • Medline
    • Web of Science ®
    • Google Scholar
    Article Location
  • 57. 
    Shmueli G. 2010. To explain or to predict? Stat. Sci. 25(3):289–310
    • Crossref
    • Web of Science ®
    • Google Scholar
    Article Location
    More AR articles citing this reference

    • The Challenge of Big Data and Data Science

      Henry E. BradyDepartment of Political Science and Goldman School of Public Policy, University of California, Berkeley, California 94720, USA; email: [email protected]
      Annual Review of Political Science Vol. 22: 297 - 323
      • ... provides a thoughtful book-length treatment; and Shmueli (2010) discusses the trade-offs....

  • 58. 
    Singh GK. 2003. Area deprivation and widening inequalities in US mortality, 1969–1998. Am. J. Public Health 93(7):1137–43
    • Crossref
    • Medline
    • Web of Science ®
    • Google Scholar
    Article Location
    More AR articles citing this reference

    • Declining Life Expectancy in the United States: Missing the Trees for the Forest

      Sam Harper,1,2,3 Corinne A. Riddell,4 and Nicholas B. King1,2,51Department of Epidemiology, Biostatistics and Occupational Health, McGill University, Montreal, Quebec H3A 1A2, Canada; email: [email protected], [email protected]2Institute for Health and Social Policy, McGill University, Montreal, Quebec H3A 1A2, Canada3Department of Public Health, Erasmus Medical Center, 3015 GD Rotterdam, The Netherlands4Division of Epidemiology and Biostatistics, School of Public Health, University of California, Berkeley, California 94720, USA; email: [email protected]5Biomedical Ethics Unit, McGill University, Montreal, Quebec H3A 1X1, Canada
      Annual Review of Public Health Vol. 42: 381 - 403
      • ...These studies largely show increasing inequalities over time (78, 136, 137) but considerable heterogeneity across age, ...
    • Social Epidemiology: Social Determinants of Health in the United States: Are We Losing Ground?

      Lisa F. BerkmanHarvard Center for Population and Development Studies, Harvard School of Public Health; email: [email protected]
      Annual Review of Public Health Vol. 30: 27 - 41
      • ...the magnitudes of the differences are highly variable across time and place (46, 47, 62, 72), ...

  • 59. 
    Srivastava N, Hinton G, Krizhevsky A, Sutskever I, Salakhutdinov R. 2014. Dropout: a simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 15:1929–58
    • Web of Science ®
    • Google Scholar
    Article Location
    More AR articles citing this reference

    • Analysis of MRI Data in Diagnostic Neuroradiology

      Saima Rathore,1, Ahmed Abdulkadir,1,2, and Christos Davatzikos11Center for Biomedical Image Computing and Analytics, Department of Radiology, Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania 19104, USA; email: [email protected]2University Hospital of of Old Age Psychiatry and Psychotherapy, University of Bern, 3008 Bern, Switzerland
      Annual Review of Biomedical Data Science Vol. 3: 365 - 390
      • ...Training deep neural networks with class-preserving augmentation, generative adversarial neural networks (61), Bernoulli dropout (62), ...
    • Sparse Data–Driven Learning for Effective and Efficient Biomedical Image Segmentation

      John A. Onofrey,1,2 Lawrence H. Staib,1,3 Xiaojie Huang,1,6 Fan Zhang,1 Xenophon Papademetris,1,3 Dimitris Metaxas,4 Daniel Rueckert,5 and James S. Duncan1,31Department of Radiology and Biomedical Imaging, Yale School of Medicine, New Haven, Connecticut 06520, USA; email: [email protected]2Department of Urology, Yale School of Medicine, New Haven, Connecticut 06520, USA3Department of Biomedical Engineering, Yale University, New Haven, Connecticut 06520, USA; email: [email protected]4Department of Computer Science, Rutgers University, Piscataway, New Jersey 08854, USA5Department of Computing, Imperial College London, London SW7 2AZ, United Kingdom6Citadel Securities, Chicago, Illinois 60603, USA
      Annual Review of Biomedical Engineering Vol. 22: 127 - 153
      • ...The idea behind dropout is to randomly select a number of the neurons at each stage of training (as many as 80%) and set the corresponding weights to zero (68)....
      • ...Srivastava et al. (68) found that, as a side effect of dropout, ...
    • Sentiment Analysis

      Robert A. StineDepartment of Statistics, University of Pennsylvania, Philadelphia, Pennsylvania 19104, USA; email: [email protected]
      Annual Review of Statistics and Its Application Vol. 6: 287 - 308
      • ...dropout (a technique specifically designed to regularize a NN; Srivastava et al. 2014), ...
    • Deep Learning in Medical Image Analysis

      Dinggang Shen,1,2 Guorong Wu,1 and Heung-Il Suk21Department of Radiology, University of North Carolina, Chapel Hill, North Carolina 27599; email: [email protected]2Department of Brain and Cognitive Engineering, Korea University, Seoul 02841, Republic of Korea; email: [email protected]
      Annual Review of Biomedical Engineering Vol. 19: 221 - 248
      • ...(b) the availability of a huge amount of data (i.e., big data), and (c) developments in learning algorithms (10...
      • ...Dropout (13) and DropConnect (81) randomly deactivate a fraction (e.g., 50%) of the units or connections in a network on each training iteration....

  • 60. 
    Stekhoven DJ, Buhlmann P. 2012. MissForest–non-parametric missing value imputation for mixed-type data. Bioinformatics 28(1):112–18
    • Crossref
    • Medline
    • Web of Science ®
    • Google Scholar
    Article Location
  • 61. 
    Tessmer HL, Ito K, Omori R. 2018. Can machines learn respiratory virus epidemiology?: A comparative study of likelihood-free methods for the estimation of epidemiological dynamics. Front. Microbiol. 9:343
    • Crossref
    • Medline
    • Web of Science ®
    • Google Scholar
    Article Location
  • 62. 
    Tibshirani R. 1996. Regression shrinkage and selection via the lasso. J. R. Stat. Soc. B 58(1):267–88
    • Google Scholar
    Article Location
    More AR articles citing this reference

    • Agricultural Crop Forecasting for Large Geographical Areas

      Linda J. YoungResearch and Development Division, USDA National Agricultural Statistics Service, Washington, DC 20250, USA; email: [email protected]
      Annual Review of Statistics and Its Application Vol. 6: 173 - 196
      • ...Lasso was used for variable selection (Tibshirani 1996)....
    • Modern Statistical Challenges in High-Resolution Fluorescence Microscopy

      Timo Aspelmeier,1,2 Alexander Egner,3 andAxel Munk1,2,41Institute for Mathematical Stochastics and2Felix Bernstein Institute for Mathematical Statistics, Georg-August Universität Göttingen, 37077 Göttingen, Germany3Department of Optical Nanoscopy, Laser-Laboratory Göttingen, 37077 Göttingen, Germany4Max Planck Institute for Biophysical Chemistry, 37077 Göttingen, Germany
      Annual Review of Statistics and Its Application Vol. 2: 163 - 202
      • ...it therefore appears to be useful to employ a sparsity-enforcing ℓ1 norm (Tibshirani 1996), ...
    • Accelerating Improvement of Livestock with Genomic Selection

      Theo Meuwissen,1 Ben Hayes,2 and Mike Goddard31Department of Animal and Aquaculture Sciences, Norwegian University of Life Sciences, Aas, Norway 1430; email: [email protected]2Biosciences Research Division, Department of Primary Industries, Bundoora 3083, Australia; email: [email protected]3Melbourne School of Land & Environment, University of Melbourne, Victoria 3010, Australia; email: [email protected]
      Annual Review of Animal Biosciences Vol. 1: 221 - 237
      • ...of the posterior as the estimate of the SNP effects (23, 24)....

  • 63. 
    Tsai C-F, Eberle W, Chu C-Y. 2013. Genetic algorithms in feature and instance selection. Knowl.-Based Syst. 39:240–47
    • Crossref
    • Web of Science ®
    • Google Scholar
    Article Location
  • 64. 
    Turing AM. 1950. Computing machinery and intelligence. Mind 59(236):433–60
    • Crossref
    • Google Scholar
    Article Location
    More AR articles citing this reference

    • Emerging Applications of Machine Learning in Food Safety

      Xiangyu Deng,1 Shuhao Cao,2 and Abigail L. Horn31Center for Food Safety, University of Georgia, Griffin, Georgia 30223, USA; email: [email protected]2Department of Mathematics and Statistics, Washington University, St. Louis, Missouri 63105, USA; email: [email protected]3Department of Preventive Medicine, University of Southern California, Los Angeles, California 90032, USA; email: [email protected]
      Annual Review of Food Science and Technology Vol. 12: 513 - 538
      • ...The theory and tools of modern machine learning were blueprinted by visionaries like Alan Turing after World War II (Turing 1950)....
    • Artificial Intelligence, Predictive Policing, and Risk Assessment for Law Enforcement

      Richard A. BerkDepartments of Statistics and Criminology, University of Pennsylvania, Philadelphia, Pennsylvania 19104, USA; email: [email protected]
      Annual Review of Criminology Vol. 4: 209 - 237
      • ...Alan Turing (1950) argued that if a machine could exhibit behavior that was indistinguishable from that of a human, ...
    • Interactions With Robots: The Truths We Reveal About Ourselves

      Elizabeth BroadbentDepartment of Psychological Medicine, Faculty of Medical and Health Sciences, The University of Auckland, Auckland 1142, New Zealand; email: [email protected]
      Annual Review of Psychology Vol. 68: 627 - 652
      • ...It was not until the middle of the twentieth century that Alan Turing and his contemporaries laid the foundations for modern digital computing and autonomous robots (Turing 1950)....
      • ...Alan Turing's (1950) famous imitation game or Turing test reasons that if an interrogator asking questions of a human and a computer is unable to distinguish which is which, ...
    • Visual Object Recognition: Do We Know More Now Than We Did 20 Years Ago?

      Jessie J. Peissig and Michael J. TarrDepartment of Cognitive and Linguistic Sciences, Brown University, Providence, Rhode Island 02912; email: [email protected], [email protected]
      Annual Review of Psychology Vol. 58: 75 - 96
      • ...the idea of a computer certainly has had dramatic impact on how we think about the mind and brain (e.g., Turing 1950)....

  • 65. 
    van der Ploeg T, Austin PC, Steyerberg EW. 2014. Modern modelling techniques are data hungry: a simulation study for predicting dichotomous endpoints. BMC Med. Res. Methodol. 14(1):137
    • Crossref
    • Medline
    • Web of Science ®
    • Google Scholar
    Article Location
  • 66. 
    Wager S, Athey S. 2018. Estimation and inference of heterogeneous treatment effects using random forests. J. Am. Stat. Assoc. 113:1228–42
    • Crossref
    • Web of Science ®
    • Google Scholar
    Article Location
    More AR articles citing this reference

    • Small Steps with Big Data: Using Machine Learning in Energy and Environmental Economics

      Matthew C. Harding1 and Carlos Lamarche21Department of Economics and Department of Statistics, University of California, Irvine, California 92697; email: [email protected]2Department of Economics, Gatton College of Business and Economics, University of Kentucky, Lexington, Kentucky 40506
      Annual Review of Resource Economics Vol. 13: 469 - 488
      • ...Wager & Athey (2018) demonstrate that the estimator can be written as a weighted prediction, ...
      • ...including causal forest (Athey & Imbens 2016, Wager & Athey 2018, Athey et al. 2019), ...
      • ...In a series of papers, Athey & Imbens (2016), Wager & Athey (2018), ...
      • ...Wager & Athey (2018) show that the estimator is consistent and asymptotically Gaussian, ...
    • Machine Learning for Social Science: An Agnostic Approach

      Justin Grimmer,1 Margaret E. Roberts,2 and Brandon M. Stewart31Department of Political Science and Hoover Institution, Stanford University, Stanford, California 94305, USA; email: [email protected]2Department of Political Science and Halıcıoğlu Data Science Institute, University of California San Diego, La Jolla, California 92093, USA; email: [email protected]3Department of Sociology and Office of Population Research, Princeton University, Princeton, New Jersey 08540, USA; email: [email protected]
      Annual Review of Political Science Vol. 24: 395 - 419
      • ...a growing literature uses machine learning methods to estimate heterogeneous treatment effects: how the effect of a particular intervention differs across characteristics of individuals (Imai & Ratkovic 2013, Athey & Imbens 2016, Grimmer et al. 2017, Wager & Athey 2018, Künzel et al. 2019)....
    • Randomized Experiments in Education, with Implications for Multilevel Causal Inference

      Stephen W. Raudenbush1 and Daniel Schwartz21Department of Sociology, Harris School of Public Policy, and Committee on Education, University of Chicago, Chicago, Illinois 60637, USA; email: [email protected]2Department of Public Health Sciences, University of Chicago, Chicago, Illinois 60637, USA
      Annual Review of Statistics and Its Application Vol. 7: 177 - 208
      • ... or by statistical methods for variable selection and high-dimensional data (Green & Kern 2012, Guo et al. 2017, Imai & Ratkovic 2013, Wager & Athey 2018)....
    • Machine Learning Methods That Economists Should Know About

      Susan Athey1,2,3 and Guido W. Imbens1,2,3,41Graduate School of Business, Stanford University, Stanford, California 94305, USA; email: [email protected], [email protected]2Stanford Institute for Economic Policy Research, Stanford University, Stanford, California 94305, USA3National Bureau of Economic Research, Cambridge, Massachusetts 02138, USA4Department of Economics, Stanford University, Stanford, California 94305, USA
      Annual Review of Economics Vol. 11: 685 - 725
      • ...although recently there have been some major advances in this area (Wager & Athey 2017, Farrell et al. 2018)....
      • ...techniques such as sample splitting [using different data to select models than to estimate parameters (e.g., Athey & Imbens 2016, Wager & Athey 2017)] and orthogonalization (e.g., ...
      • ...Although there has recently been substantial progress in the development of methods for inference for low-dimensional functionals in specific settings [e.g., the work of Wager & Athey (2017)...
      • ...These will rapidly degrade the performance of kernel regression but will not affect a random forest nearly as severely because it will largely ignore them (Wager & Athey 2017)....
      • ...Wager & Athey (2017) show that a particular variant of random forests can produce estimates with an asymptotically normal distribution centered on the true value ; furthermore, ...
      • ...Recently random forests have been extended to settings where the interest is in causal effects, either average or unit-level (Wager & Athey 2017), ...
      • ...the theoretical and practical properties of these techniques are poor with many covariates. Wager & Athey (2017) introduce causal forests....
      • ...but one where there is a data-driven approach to determine which dimensions of the covariate space are important to match on. Wager & Athey (2017) establish asymptotic normality of the estimator (so long as tree estimation is honest, ...
    • Machine Learning for Sociology

      Mario Molina and Filiz GaripDepartment of Sociology, Cornell University, Ithaca, New York 14853, USA; email: [email protected], [email protected]
      Annual Review of Sociology Vol. 45: 27 - 45
      • ...and the other half to estimate the treatment effects within the leaves. Wager & Athey (2018) extend the method to random forests that average across many causal trees and allow for personalized treatment effects (where each individual observation gets a distinct estimate)....
    • Better Government, Better Science: The Promise of and Challenges Facing the Evidence-Informed Policy Movement

      Jake Bowers1 and Paul F. Testa21Department of Political Science, University of Illinois at Urbana-Champaign, Urbana, Illinois 61801, USA; email: [email protected]2Department of Political Science, Brown University, Providence, Rhode Island 02912, USA; email: [email protected]
      Annual Review of Political Science Vol. 22: 521 - 542
      • ...—with applications from the field of machine learning to let the data help identify heterogeneous treatment effects (Imai & Ratkovic 2013, Wager & Athey 2017)....
    • Precision Medicine

      Michael R. Kosorok1 and Eric B. Laber2 1Department of Biostatistics and Department of Statistics and Operations Research, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina 27599, USA; email: [email protected]2Department of Statistics, North Carolina State University, Raleigh, North Carolina 27695, USA; email: [email protected]
      Annual Review of Statistics and Its Application Vol. 6: 263 - 286
      • ...including for support vector machines (Laber & Murphy 2011) and random forests (Wager & Athey 2018)....

  • 67. 
    Wang Y, Wang D, Ye X, Wang Y, Yin Y, Jin Y. 2019. A tree ensemble-based two-stage model for advanced-stage colorectal cancer survival prediction. Inf. Sci. 474:106–24
    • Crossref
    • Web of Science ®
    • Google Scholar
    Article Location
  • 68. 
    Wiemken TL, Carrico RM, Furmanek SP, Guinn BE, Mattingly WA, et al. The impact of socioeconomic position on the incidence, severity, and clinical outcomes of hospitalized patients with community-acquired pneumonia. Public Health Rep. In press
    • Google Scholar
    Article Location
  • 69. 
    Wiemken TL, Furmanek SP, Mattingly WA, Guinn BE, Cavallazzi R, et al. 2017. Predicting 30-day mortality in hospitalized patients with community-acquired pneumonia using statistical and machine learning approaches. J. Respir. Infect. 1(3):50–56
    • Crossref
    • Google Scholar
    Article Location
  • 70. 
    Wiemken TL, Kelley RR, Fernandez-Botran R, Mattingly WA, Arnold FW, et al. 2017. Using cluster analysis of cytokines to identify patterns of inflammation in hospitalized patients with community-acquired pneumonia: a pilot study. J. Respir. Infect. 1(1):3–11
    • Crossref
    • Medline
    • Google Scholar
    Article Location
  • 71. 
    Wiemken TL, Kelley RR, Mattingly WA, Ramirez JA. 2019. Clinical research in pneumonia: role of artificial intelligence. J. Respir. Infect. 3(1):1–4
    • Crossref
    • Google Scholar
    Article Location

More AR articles citing this reference

TERMS AND DEFINITIONS

Feature:

the variable(s) used to assist the model in predicting an outcome or those used in a cluster or data reduction algorithm

Feature engineering:

the process of creating new features from existing features using mathematical and various combinatorial approaches

Feature selection:

the process of selecting a subset of features to include in a machine learning model

Hyperparameter:

option(s) required in many machine learning algorithms to fine-tune or optimize the training of the model

Label:

the outcome variable in a supervised machine learning model

Machine learning:

an umbrella term encompassing a multitude of algorithms used for prediction and estimation of treatment effects

Performance:

the various statistical values gathered from machine learning models, used to assess how well the model achieves its intended purpose

Supervised machine learning:

a machine learning approach focused on predicting an outcome of interest

Testing:

the process of passing an independent data set (typically the remaining data not used in the training of the model) to the trained model, producing various performance metrics

Training:

the supervised machine learning process of creating a predictive model through using a subset of data, often around 70%

Unsupervised (machine learning):

a machine learning approach with no outcome, used for clustering and data reduction

Footnotes:

Copyright © 2020 by Annual Reviews. This work is licensed under a Creative Commons Attribution 4.0 International License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. See credit lines of images or other third party material in this article for license information.

  • Figures
  • Tables
image
  • Table 1  -Linking terms and phrases in epidemiology and machine learning
  • Figures
  • Tables
image

Figure 1  The end-to-end supervised machine learning workflow.

Download Full-ResolutionDownload PPT

Figure Locations

...Below, we describe the end-to-end process of machine learning (Figure 1), ...

  • Figures
  • Tables

Table 1  Linking terms and phrases in epidemiology and machine learning

Term in epidemiology and biostatisticsTerm in machine/statistical learning
Dependent variable; outcome variable; response variableLabel/class
Independent variable; predictor variable; explanatory variableFeature
Contingency table; 2 × 2 tableConfusion matrix
SensitivityRecall
Positive predictive valuePrecision
Deep learningArtificial neural network with more than 1 hidden layer
Outcome group with the highest frequencyMajority class
Outcome group with the lowest frequencyMinority class
Proportion of cases in each category of the outcome variable (when outcome is categorical)Class balance
Previous Article Next Article
  • Related Articles
  • Literature Cited
  • Most Downloaded
Most Downloaded from this journal

The Growing Impact of Globalization for Health and Public Health Practice

Ronald Labonté, Katia Mohindra, and Ted Schrecker
Vol. 32, 2011

Abstract - FiguresPreview

Abstract

In recent decades, public health policy and practice have been increasingly challenged by globalization, even as global financing for health has increased dramatically. This article discusses globalization and its health challenges from a vantage of ...Read More

  • Full Text HTML
  • Download PDF
  • Figures
image

Figure 1: Global poverty: World Bank $1.25/day poverty line. Source: Data from Reference 24. Note that East Asia and Pacific includes China; South Asia includes India.

image

Figure 2: Global poverty: World Bank $2.50/day poverty line. Source: Data from Reference 24. Note that East Asia and Pacific includes China; South Asia includes India.

image

Figure 3: Quadruple burden of disease in South Africa: percentage of overall years of life lost, 2000. Source: (16). “Pre-transitional causes” of death include communicable diseases, maternal and peri...


Racism and Health: Evidence and Needed Research

David R. Williams, Jourdyn A. Lawrence, Brigette A. Davis
Vol. 40, 2019

AbstractPreview

Abstract

In recent decades, there has been remarkable growth in scientific research examining the multiple ways in which racism can adversely affect health. This interest has been driven in part by the striking persistence of racial/ethnic inequities in health and ...Read More

  • Full Text HTML
  • Download PDF

Designing Difference in Difference Studies: Best Practices for Public Health Policy Research

Coady Wing, Kosali Simon, Ricardo A. Bello-Gomez
Vol. 39, 2018

AbstractPreview

Abstract

The difference in difference (DID) design is a quasi-experimental research design that researchers often use to study causal relationships in public health settings where randomized controlled trials (RCTs) are infeasible or unethical. However, causal ...Read More

  • Full Text HTML
  • Download PDF

Public Health and Online Misinformation: Challenges and Recommendations

Briony Swire-Thompson and David Lazer
Vol. 41, 2020

Abstract - FiguresPreview

Abstract

The internet has become a popular resource to learn about health and to investigate one's own health condition. However, given the large amount of inaccurate information online, people can easily become misinformed. Individuals have always obtained ...Read More

  • Full Text HTML
  • Download PDF
  • Figures
image

Figure 1: User ratings of apricot kernels receive a 4.60 out of 5 efficacy score for cancer on WebMD (130).

image

Figure 2: Survival of patients with colorectal cancers receiving alternative medicine (blue solid line) versus conventional cancer treatment (orange dashed line). Figure adapted with permission from J...

image

Figure 3: Percentage of US adults who say they have a great deal of confidence in the people in the scientific community, medicine, and the press between 1972 and 2018. Figure adapted with permission ...


The Role of Media Violence in Violent Behavior

L. Rowell Huesmann and Laramie D. Taylor
Vol. 27, 2006

Abstract - FiguresPreview

Abstract

▪ Abstract Media violence poses a threat to public health inasmuch as it leads to an increase in real-world violence and aggression. Research shows that fictional television and film violence contribute to both a short-term and a long-term increase in ...Read More

  • Full Text HTML
  • Download PDF
  • Figures
image

Figure 1 : The relative strength of known public health threats.


See More
  • © Copyright 2022
  • Contact Us
  • Email Preferences
  • Annual Reviews Directory
  • Multimedia
  • Supplemental Materials
  • FAQs
  • Privacy Policy
Back to Top

PRIVACY NOTICE

Accept

This site requires the use of cookies to function. It also uses cookies for the purposes of performance measurement. Please see our Privacy Policy.