- Home
- A-Z Publications
- Annual Review of Statistics and Its Application
- Previous Issues
- Volume 3, 2016
Annual Review of Statistics and Its Application - Volume 3, 2016
Volume 3, 2016
-
-
From CT to fMRI: Larry Shepp's Impact on Medical Imaging
Vol. 3 (2016), pp. 1–19More LessLarry Shepp worked extensively in the field of medical imaging for almost 40 years. He made seminal contributions to the areas of computed tomography (CT), positron emission tomography (PET), and functional magnetic resonance imaging (fMRI). In this review, I highlight some of these important contributions, with the goal of illustrating the important role that mathematics and statistics played in their development.
-
-
-
League Tables for Hospital Comparisons
Vol. 3 (2016), pp. 21–50More LessWe review statistical methods for estimating and interpreting league tables used to infer hospital quality with a primary focus on methods for partitioning variation into two types: (a) that associated with within-hospital variation for a homogeneous group of patients and (b) that produced by between-hospital variation. We discuss the types of covariates included in the model, hierarchical and nonhierarchical logistic regression models for conducting inferences in a low-information context and their associated trade-offs, and the role of hospital volume. We use all-cause mortality rates for US hospitals to illustrate concepts and methods.
-
-
-
Bayes and the Law
Vol. 3 (2016), pp. 51–77More LessAlthough the use of statistics in legal proceedings has considerably grown in the last 40 years, primarily classical statistical methods rather than Bayesian methods have been used. Yet the Bayesian approach avoids many of the problems of classical statistics and is also well suited to a broader range of problems. This article reviews the potential and actual use of Bayes in the law and explains the main reasons for its lack of impact on legal practice. These reasons include misconceptions by the legal community about Bayes' theorem, overreliance on the use of the likelihood ratio, and the lack of adoption of modern computational methods. We argue that Bayesian networks, which automatically produce the necessary Bayesian calculations, provide an opportunity to address most concerns about using Bayes in the law.
-
-
-
There Is Individualized Treatment. Why Not Individualized Inference?
Keli Liu, and Xiao-Li MengVol. 3 (2016), pp. 79–111More LessDoctors use statistics to advance medical knowledge; we use a medical analogy to build statistical inference “from scratch” and to highlight an improvement. A doctor, perhaps implicitly, predicts a treatment's effectiveness for an individual patient based on its performance in a clinical trial; the trial patients serve as controls for that particular patient. The same logic underpins statistical inference: To identify the best statistical procedure to use for a problem, we simulate a set of control problems and evaluate candidate procedures on the controls. Recent interest in personalized/individualized medicine stems from the recognition that some clinical trial patients are better controls for a particular patient than others. Therefore, an individual patient's treatment decisions should depend only on a subset of relevant patients. Individualized statistical inference implements this idea for control problems (rather than for patients). Its potential for improving data analysis matches that of personalized medicine for improving health care. The central issue—for both individualized medicine and individualized inference—is how to make the right relevance-robustness trade-off: If we exercise too much judgment in determining which controls are relevant, our inferences will not be robust. How much is too much? We argue that the unknown answer is the Holy Grail of statistical inference.
-
-
-
Data Sharing and Access
Vol. 3 (2016), pp. 113–132More LessData sharing and access are venerable problems embedded in a rapidly changing milieu. Pressure points include the increasingly data-driven nature of science, the volume, complexity, and distributed nature of data, new concerns regarding privacy and confidentiality, and rising attention to reproducibility of research. In the context of research data, this review surveys extant technologies, articulates a number of identified and emerging issues, and outlines one path for the future. Recognizing that data availability is a public good, research data archives can provide economic and scientific value to both data generators and data consumers in a way that engenders trust. The overall framework is statistical—the use of data for inference.
-
-
-
Data Visualization and Statistical Graphics in Big Data Analysis
Vol. 3 (2016), pp. 133–159More LessThis article discusses the role of data visualization in the process of analyzing big data. We describe the historical origins of statistical graphics, from the birth of exploratory data analysis to the impacts of statistical graphics on practice today. We present examples of contemporary data visualizations in the process of exploring airline traffic, global standardized test scores, election monitoring, Wikipedia edits, the housing crisis as observed in San Francisco, and the mining of credit card databases. We provide a review of recent literature. Good data visualization yields better models and predictions and allows for the discovery of the unexpected.
-
-
-
Does Big Data Change the Privacy Landscape? A Review of the Issues
Vol. 3 (2016), pp. 161–180More LessThe current data revolution is changing the conduct of social science research as increasing amounts of digital and administrative data become accessible for use. This new data landscape has created significant tension around data privacy and confidentiality. The risk–utility theory and models underpinning statistical disclosure limitation may be too restrictive for providing data confidentially owing to the growing volumes and varieties of data and the evolving privacy policies. Science and society need to move to a trust-based approach from which both researchers and participants benefit. This review discusses the explosive growth of the new data sources and the parallel evolution of privacy policy and governance, with a focus on access to data for research. We provide a history of privacy policy, statistical disclosure limitation research, and record linkage in the context of this brave new world of data.
-
-
-
Statistical Methods in Integrative Genomics
Vol. 3 (2016), pp. 181–209More LessStatistical methods in integrative genomics aim to answer important biology questions by jointly analyzing multiple types of genomic data (vertical integration) or aggregating the same type of data across multiple studies (horizontal integration). In this article, we introduce different types of genomic data and data resources, and then we review statistical methods of integrative genomics with emphasis on the motivation and rationale of these methods. We conclude with some summary points and future research directions.
-
-
-
On the Frequentist Properties of Bayesian Nonparametric Methods
Vol. 3 (2016), pp. 211–231More LessIn this paper, I review the main results on the asymptotic properties of the posterior distribution in nonparametric or high-dimensional models. In particular, I explain how posterior concentration rates can be derived and what we learn from such analysis in terms of the impact of the prior distribution on high-dimensional models. These results concern fully Bayes and empirical Bayes procedures. I also describe some of the results that have been obtained recently in semiparametric models, focusing mainly on the Bernstein–von Mises property. Although these results are theoretical in nature, they shed light on some subtle behaviors of the prior models and sharpen our understanding of the family of functionals that can be well estimated for a given prior model.
-
-
-
Statistical Model Choice
Vol. 3 (2016), pp. 233–256More LessVariable selection methods and model selection approaches are valuable statistical tools that are indispensable for almost any statistical modeling question. This review first considers the use of information criteria for model selection. Such criteria provide an ordering of the considered models where the best model is selected. Different modeling goals might require different criteria to be used. Next, the effect of including a penalty in the estimation process is discussed. Third, nonparametric estimation is discussed; it contains several aspects of model choice, such as the choice of the estimator to use and the selection of tuning parameters. Fourth, model averaging approaches are reviewed in which estimators from different models are weighted to provide one final estimator. There are several ways to choose the weights, and most of them result in data-driven, hence random, weights. Challenges for inference after model selection and inference for model-averaged estimators are discussed.
-
-
-
Functional Data Analysis
Vol. 3 (2016), pp. 257–295More LessWith the advance of modern technology, more and more data are being recorded continuously during a time interval or intermittently at several discrete time points. These are both examples of functional data, which has become a commonly encountered type of data. Functional data analysis (FDA) encompasses the statistical methodology for such data. Broadly interpreted, FDA deals with the analysis and theory of data that are in the form of functions. This paper provides an overview of FDA, starting with simple statistical notions such as mean and covariance functions, then covering some core techniques, the most popular of which is functional principal component analysis (FPCA). FPCA is an important dimension reduction tool, and in sparse data situations it can be used to impute functional data that are sparsely observed. Other dimension reduction approaches are also discussed. In addition, we review another core technique, functional linear regression, as well as clustering and classification of functional data. Beyond linear and single- or multiple- index methods, we touch upon a few nonlinear approaches that are promising for certain applications. They include additive and other nonlinear functional regression models and models that feature time warping, manifold learning, and empirical differential equations. The paper concludes with a brief discussion of future directions.
-
-
-
Item Response Theory
Vol. 3 (2016), pp. 297–321More LessThis review introduces classical item response theory (IRT) models as well as more contemporary extensions to the case of multilevel, multidimensional, and mixtures of discrete and continuous latent variables through the lens of discrete multivariate analysis. A general modeling framework is discussed, and the applications of this framework in diverse contexts are presented, including large-scale educational surveys, randomized efficacy studies, and diagnostic measurement. Other topics covered include parameter estimation and model fit evaluation. Both classical (numerical integration based) and more modern (stochastic) parameter estimation approaches are discussed. Similarly, limited information goodness-of-fit testing and posterior predictive model checking are reviewed and contrasted. The review concludes with a discussion of some emerging strands in IRT research such as response time modeling, crossed random effects models, and non-standard models for response processes.
-
-
-
Stochastic Processing Networks
Vol. 3 (2016), pp. 323–345More LessStochastic processing networks arise as models in manufacturing, telecommunications, transportation, computer systems, the customer service industry, and biochemical reaction networks. Common characteristics of these networks are that they have entities—such as jobs, packets, vehicles, customers, or molecules—that move along routes, wait in buffers, receive processing from various resources, and are subject to the effects of stochastic variability through such quantities as arrival times, processing times, and routing protocols. The mathematical theory of queueing aims to understand, analyze, and control congestion in stochastic processing networks. In this article, we begin by summarizing some of the highlights in the development of the theory of queueing prior to 1990; this includes some exact analysis and development of approximate models for certain queueing networks. We then describe some surprises of the early 1990s and ensuing developments of the past 25 years related to the use of approximate models for analyzing the stability and performance of multiclass queueing networks. We conclude with a description of recent developments for more general stochastic processing networks and point to some open problems.
-
-
-
The US Federal Statistical System's Past, Present, and Future
Vol. 3 (2016), pp. 347–373More LessThis article reviews the US federal statistical system from its roots in the colonial period and the early years of the federal republic through its growth in the nineteenth and twentieth centuries to the present day, including the coordination role played by the US Office of Management and Budget. The review highlights the innovations, benefits, and challenges of the federal statistical system and comments on the role played by major sources of data for the system, including censuses, probability surveys, administrative records, and newer sources. The article also assesses the strengths and weaknesses of the system from studies of the Committee on National Statistics (CNSTAT), a standing unit of the National Academies of Sciences, Engineering, and Medicine established in 1972 to link the academic community with federal statisticians and researchers. It concludes with observations on the future of the federal statistical system.
-
-
-
Are Survey Weights Needed? A Review of Diagnostic Tests in Regression Analysis
Vol. 3 (2016), pp. 375–392More LessResearchers apply sampling weights to take account of unequal sample selection probabilities and to frame coverage errors and nonresponses. If researchers do not weight when appropriate, they risk having biased estimates. Alternatively, when they unnecessarily apply weights, they can create an inefficient estimator without reducing bias. Yet in practice researchers rarely test the necessity of weighting and are sometimes guided more by the current practice in their field than by scientific evidence. In addition, statistical tests for weighting are not widely known or available. This article reviews empirical tests to determine whether weighted analyses are justified. We focus on regression models, though the review's implications extend beyond regression. We find that nearly all weighting tests fall into two categories: difference in coefficients tests and weight association tests. We describe the distinguishing features of each category, present their properties, and explain the close relationship between them. We review the simulation evidence on their sampling properties in finite samples. Finally, we highlight the unanswered theoretical and practical questions that surround these tests and that deserve further research.
-