- Home
- A-Z Publications
- Annual Review of Statistics and Its Application
- Previous Issues
- Volume 10, 2023
Annual Review of Statistics and Its Application - Volume 10, 2023
Volume 10, 2023
-
-
A Review of Generalizability and Transportability
Irina Degtiar, and Sherri RoseVol. 10 (2023), pp. 501–524More LessWhen assessing causal effects, determining the target population to which the results are intended to generalize is a critical decision. Randomized and observational studies each have strengths and limitations for estimating causal effects in a target population. Estimates from randomized data may have internal validity but are often not representative of the target population. Observational data may better reflect the target population, and hence be more likely to have external validity, but are subject to potential bias due to unmeasured confounding. While much of the causal inference literature has focused on addressing internal validity bias, both internal and external validity are necessary for unbiased estimates in a target population. This article presents a framework for addressing external validity bias, including a synthesis of approaches for generalizability and transportability, and the assumptions they require, as well as tests for the heterogeneity of treatment effects and differences between study and target populations.
-
-
-
Three-Decision Methods: A Sensible Formulation of Significance Tests—and Much Else
Vol. 10 (2023), pp. 525–546More LessFor real-valued parameters, significance tests can be motivated as three-decision methods, in which we either assert the sign of the parameter above or below a specified null value, or say nothing either way. Tukey viewed this as a “sensible formulation” of tests, unlike the widely taught null hypothesis significance testing (NHST) system that is today's default. We review the three-decision framework, collecting the substantial literature on how other statistical tools can be usefully motivated in this way. These tools include close Bayesian analogs of frequentist power calculations, p-values, confidence intervals, and multiple testing corrections. We also show how three-decision arguments can straightforwardly resolve some well-known difficulties in the interpretation and criticism of testing results. Explicit results are shown for simple conjugate analyses, but the methods discussed apply generally to real-valued parameters.
-
-
-
Second-Generation Functional Data
Vol. 10 (2023), pp. 547–572More LessModern studies from a variety of fields record multiple functional observations according to either multivariate, longitudinal, spatial, or time series designs. We refer to such data as second-generation functional data because their analysis—unlike typical functional data analysis, which assumes independence of the functions—accounts for the complex dependence between the functional observations and requires more advanced methods. In this article, we provide an overview of the techniques for analyzing second-generation functional data with a focus on highlighting the key methodological intricacies that stem from the need for modeling complex dependence, compared with independent functional data. For each of the four types of second-generation functional data presented—multivariate functional data, longitudinal functional data, functional time series and spatially functional data—we discuss how the widely popular functional principal component analysis can be extended to these settings to define, identify main directions of variation, and describe dependence among the functions. In addition to modeling, we also discuss prediction, statistical inference, and application to clustering. We close by discussing future directions in this area.
-
-
-
Model-Based Clustering
Vol. 10 (2023), pp. 573–595More LessClustering is the task of automatically gathering observations into homogeneous groups, where the number of groups is unknown. Through its basis in a statistical modeling framework, model-based clustering provides a principled and reproducible approach to clustering. In contrast to heuristic approaches, model-based clustering allows for robust approaches to parameter estimation and objective inference on the number of clusters, while providing a clustering solution that accounts for uncertainty in cluster membership. The aim of this article is to provide a review of the theory underpinning model-based clustering, to outline associated inferential approaches, and to highlight recent methodological developments that facilitate the use of model-based clustering for a broad array of data types. Since its emergence six decades ago, the literature on model-based clustering has grown rapidly, and as such, this review provides only a selection of the bibliography in this dynamic and impactful field.
-
-
-
Model Diagnostics and Forecast Evaluation for Quantiles
Vol. 10 (2023), pp. 597–621More LessModel diagnostics and forecast evaluation are closely related tasks, with the former concerning in-sample goodness (or lack) of fit and the latter addressing predictive performance out-of-sample. We review the ubiquitous setting in which forecasts are cast in the form of quantiles or quantile-bounded prediction intervals. We distinguish unconditional calibration, which corresponds to classical coverage criteria, from the stronger notion of conditional calibration, as can be visualized in quantile reliability diagrams. Consistent scoring functions—including, but not limited to, the widely used asymmetricpiecewise linear score or pinball loss—provide for comparative assessment and ranking, and link to the coefficient of determination and skill scores. We illustrate the use of these tools on Engel's food expenditure data, the Global Energy Forecasting Competition 2014, and the US COVID-19 Forecast Hub.
-
-
-
Statistical Methods for Exoplanet Detection with Radial Velocities
Vol. 10 (2023), pp. 623–649More LessExoplanets can be detected with various observational techniques. Among them, radial velocity (RV) has the key advantages of revealing the architecture of planetary systems and measuring planetary mass and orbital eccentricities. RV observations are poised to play a key role in the detection and characterization of Earth twins. However, the detection of such small planets is not yet possible due to very complex, temporally correlated instrumental and astrophysical stochastic signals. Furthermore, exploring the large parameter space of RV models exhaustively and efficiently presents difficulties. In this review, we frame RV data analysis as a problem of detection and parameter estimation in unevenly sampled, multivariate time series. The objective of this review is two-fold: to introduce the motivation, methodological challenges, and numerical challenges of RV data analysis to nonspecialists, and to unify the existing advanced approaches in order to identify areas for improvement.
-
-
-
Statistical Applications to Cognitive Diagnostic Testing
Vol. 10 (2023), pp. 651–675More LessDiagnostic classification tests are designed to assess examinees’ discrete mastery status on a set of skills or attributes. Such tests have gained increasing attention in educational and psychological measurement. We review diagnostic classification models and their applications to testing and learning, discuss their statistical and machine learning connections and related challenges, and introduce some contemporary and future extensions.
-
-
-
Player Tracking Data in Sports
Vol. 10 (2023), pp. 677–697More LessThere has been rapid growth in the collection of player tracking data in recent years. These data, providing spatiotemporal locations of players and ball at high resolution, have spurred methodological developments in a range of sports. There have been impacts in the development of player performance measurement (e.g., distance traveled) and in the attribution of value to specific plays (e.g., expected points from a given position) or even specific actions within a play. This review highlights key methodological contributions via statistical and machine learning approaches. The studies and outcomes discussed show how sports can be a playground for extending analytical techniques in a range of areas. The review also describes the ongoing methodological challenges associated with the use of tracking data.
-
-
-
Six Statistical Senses
Vol. 10 (2023), pp. 699–725More LessThis article proposes a set of categories, each one representing a particular distillation of important statistical ideas. Each category is labeled a “sense” because we think of these as essential in helping every statistical mind connect in constructive and insightful ways with statistical theory, methodologies, and computation, toward the ultimate goal of building statistical phronesis. The illustration of each sense with statistical principles and methods provides a sensical tour of the conceptual landscape of statistics, as a leading discipline in the data science ecosystem.
-