1932

Abstract

Data, and hence data quality, transcend all boundaries of science, commerce, engineering, medicine, public health, and policy. Data quality has historically been addressed by controlling the measurement processes, controlling the data collection processes, and through data ownership. For many data sources being leveraged into data science, this approach to data quality may be challenged. To understand that challenge, a historical and disciplinary perspective on data quality, highlighting the evolution and convergence of data concepts and applications, is presented.

Loading

Article metrics loading...

/content/journals/10.1146/annurev-statistics-060116-054114
2017-03-07
2024-06-18
Loading full text...

Full text loading...

/deliver/fulltext/statistics/4/1/annurev-statistics-060116-054114.html?itemId=/content/journals/10.1146/annurev-statistics-060116-054114&mimeType=html&fmt=ahah

Literature Cited

  1. Abate M, Diegert K, Allen H. 1998. A hierarchical approach to improving data quality. Data Qual 4:1365–69 [Google Scholar]
  2. Agafitei M, Gras F, Kloek W, Reis F. et al. 2015. Measuring output quality for multisource statistics in official statistics: some directions. Stat. J. IAOS 31:2203–11 [Google Scholar]
  3. Agarwal N, Yiliyasi Y. 2010. Information quality challenges in social media. Int. Conf. Inform. Q. (ICIQ). http://mitiq.mit.edu/ICIQ/Documents/IQ%20Conference%202010/Papers/3A1_IQChallengesInSocialMedia.pdf [Google Scholar]
  4. Arts DGT, De Keizer NF, Scheffer G. 2002. Defining and improving data quality in medical registries: a literature review, case study, and generic framework. J. Am. Med. Inform. Assoc 96600–11 [Google Scholar]
  5. Asur S, Huberman BA. 2010. Predicting the future with social media. 2010 IEEE/WIC/ACM Int. Conf. Web Intell. Intell. Agent Technol. (WI-IAT) 1492–99 Piscataway, NJ: IEEE [Google Scholar]
  6. Aust. Bur. Stat. 2009. The ABS Data Quality Framework Belconnen, Aust.: Aust. Bur. Stat http://www.abs.gov.au/ausstats/[email protected]/mf/1520.0 [Google Scholar]
  7. Baker M. 2016. Statisticians issue warning over misuse of p-values. Nat. News 531:151 [Google Scholar]
  8. Ballou D, Wang R, Pazer H, Tayi GK. 1998. Modeling information manufacturing systems to determine information product quality. Manag. Sci. 44:4462–84 [Google Scholar]
  9. Batini C, Cappiello C, Francalanci C, Maurino A. 2009. Methodologies for data quality assessment and improvement. ACM Comput. Surv. 41:316 [Google Scholar]
  10. Batini C, Scannapieco M. 2006. Introduction to data quality. Data Quality: Concepts, Methodologies and Techniques C Batini, M Scannapieco 1–18 New York: Springer [Google Scholar]
  11. Becker KG. 2001. The sharing of canal microarray data. Nat. Rev. Neurosci. 2:6438–40 [Google Scholar]
  12. Behn R. 2015. The black box of randomized controlled trials. Bob Behn's Perform. Leadersh. Rep. 12:51 [Google Scholar]
  13. Biemer PP. 2010. Total survey error: design, implementation, and evaluation. Public Opin. Q. 74:5817–48 [Google Scholar]
  14. Biemer PP, Lyberg LE. 2003. Introduction to Survey Quality New York: Wiley [Google Scholar]
  15. Biemer P, Trewin D, Bergdahl H, Japec L. 2014. A system for managing the quality of official statistics. J. Off. Stat. 30:3381–415 [Google Scholar]
  16. Bollen J, Mao H, Zeng X. 2011. Twitter mood predicts the stock market. J. Comput. Sci. 2:11–8 [Google Scholar]
  17. Boritz JE. 2005. IS practitioners’ views on core concepts of information integrity. Int. J. Account. Inform. Syst. 6:4260–79 [Google Scholar]
  18. Braaksma B, Zeelenberg K. 2015. Re-make/re-model: should big data change the modelling paradigm in official statistics?. Stat. J. Int. Assoc. Off. Stat. 31:2193–202 [Google Scholar]
  19. Brackstone G. 1999. Managing data quality in a statistical agency. Surv. Methodol. 25:2139–50 [Google Scholar]
  20. Brooks CA, Bailer BA. 1978. An error profile: employment as measured by the current population survey Work. Pap. 3, Off. Fed. Stat. Policy Stand. [Google Scholar]
  21. Cabitza F, Batini C. 2016. Information quality in healthcare. Data and Information Quality: Dimensions, Principles and Techniques C Batini, M Scannapieco 21–51 London: Springer [Google Scholar]
  22. Cavallo A. 2015. Scraped data and sticky prices Work. Pap. 21490, Natl. Bur. Econ. Res. [Google Scholar]
  23. Chapman AD. 2005. Principles of data quality Rep., Glob. Biodivers. Inf. Facil., Copenhagen [Google Scholar]
  24. Chen F, Neill DB. 2014. Non-parametric scan statistics for event detection and fore-casting in heterogeneous social media graphs. Proc. 20th ACM SIGKDD Int. Conf. Knowl. Discov. Data Min.1166–75 New York: ACM [Google Scholar]
  25. Choi H, Varian H. 2012. Predicting the present with Google trends. Econ. Rec. 88:2–9 [Google Scholar]
  26. Contreras JL, Reichman JH. 2015. Sharing by design: data and decentralized commons. Science 350:62661312–14 [Google Scholar]
  27. Cook TD, Campbell DT, Day A. 1979. Quasi-experimentation: Design Analysis Issues for Field Settings Boston: Houghton Mifflin [Google Scholar]
  28. Couper M. 2013. Is the sky falling? New technology, changing media, and the future of surveys. Surv. Res. Methods 7:145–56 [Google Scholar]
  29. Culotta A. 2010. Towards detecting influenza epidemics by analyzing Twitter messages. Proc. 1st Worksh. Soc. Media Anal.115–122 New York: ACM [Google Scholar]
  30. Daas P, Roos M, Van de Ven M, Neroni J. 2012. Twitter as a potential data source for statistics Work. Pap. 201221, Cent. Bur. Stat . http://www.pietdaas.nl/beta/pubs/pubs/DiscPaper_Twitter.pdf [Google Scholar]
  31. Deaton A. 2010. Instruments, randomization, and learning about development. J. Econ. Lit. 48:424–55 [Google Scholar]
  32. Deaton A, Cartwright N. 2016. Understanding and misunderstanding randomized controlled trials NBER Work. Pap. 22595. http://www.princeton.edu/∼deaton/downloads/Deaton_Cartwright_RCTs_with_ABSTRACT_August_25.pdf [Google Scholar]
  33. Deming WE. 1950. Lectures on Statistical Control of Quality. Tokyo: Nippon Kagaku Gijutsu Remmei
  34. Deming WE. 1993. The New Economics for Industry, Government, Education Cambridge, MA: MIT Press [Google Scholar]
  35. Deming WE, Geoffrey L. 1941. On sample inspection in the processing of census returns. J. Am. Stat. Assoc. 36:215351–60 [Google Scholar]
  36. Dippo CS. 1997. Survey Measurement and Process Improvement: Concepts and Integration Hoboken, NJ: Wiley [Google Scholar]
  37. Donoho D. 2015. 50 years of data science Presented at Tukey Centen Worksh Princeton, NJ:September 18. http://courses.csail.mit.edu/18.337/2015/docs/50YearsDataScience.pdf [Google Scholar]
  38. Dunning T. 2012. Natural Experiments in the Social Sciences: A Design-based Approach Cambridge, UK: Cambridge Univ. Press [Google Scholar]
  39. Emamjome F. 2014. A theoretical approach to conceptualize information quality in social media. Proc. 25th Australas. Conf. Inf. Syst., Auckland, NZ. http://www.pacis-net.org/file/2013/PACIS2013-072.pdf [Google Scholar]
  40. Emamjome FF, Rabaa'i AA, Gable GG, Bandara W. 2013. Information quality in social media: A conceptual model. Proc. Pac. Asia Conf. Inf. Syst. (PACIS 2013) http://www.pacis-net.org/file/2013/PACIS2013-072.pdf [Google Scholar]
  41. EPA (Environ. Prot. Agency) 2000. Guidance for data quality assessment: practical methods for data analysis. Tech. Rep. EPA QA/G-9, Environ. Prot. Agency, Washington, DC [Google Scholar]
  42. EPA (Environ. Prot. Agency) 2006. Data quality assessment: statistical methods for practitioners Tech. Rep. EPA QA/G-9S, Environ. Prot. Agency, Washington, DC [Google Scholar]
  43. Ettredge M, Gerdes J, Karuga G. 2005. Using web-based search data to predict macroeconomic statistics. Commun. ACM 48:1187–92 [Google Scholar]
  44. ESS (Eur. Stat. Syst.) 2015. Quality assurance framework of the European Statistical System, version 1.2. http://ec.europa.eu/eurostat/documents/64157/4392716/ESS-QAF-V1-2final.pdf/bbf5970c-1adf-46c8-afc3-58ce177a0646
  45. FDA (Food Drug Admin.) 2013. Guidance for Industry: Electronic Source Data in Clinical Investigations Washington, DC: Dep. Health Hum. Serv http://www.fda.gov/downloads/drugs/guidancecomplianceregulatoryinformation/guidances/ucm328691.pdf [Google Scholar]
  46. Fisher C, Lauría E, Chengalur-Smith S, Wang R. 2012. Introduction to Information Quality Bloomington, IN: AuthorHouse [Google Scholar]
  47. Fisher RA. 1925. Theory of statistical estimation. Mathematical Proceedings of the Cambridge Philosophical Society 22700–25 Cambridge, UK: Cambridge Univ. Press [Google Scholar]
  48. Friedman LM, Furberg CD, DeMets DL, Reboussin DM, Granger CB. 2015. Fundamentals of Clinical Trials. New York: Springer. , 5th ed..
  49. Ge M, Helfert M. 2007. A review of information quality research—develop a research agenda. Proc. 12th Int. Conf. Inf. Qual., Cambridge, MA, Nov. 9–11. http://mitiq.mit.edu/ICIQ/PDF/A%20REVIEW%20OF%20INFORMATION%20QUALITY%20RESEARCH.pdf [Google Scholar]
  50. George SL, Buyse M. 2015. Data fraud in clinical trials. Clin. Investig. 5:2161–73 [Google Scholar]
  51. Gliklich R, Dreyer N, Leavy M. 2014. Registries for Evaluating Patient Outcomes: A User's Guide Rockville, MD: Agency Healthc. Res. Qual, 3rd ed.. [Google Scholar]
  52. Goel S, Hofman JM, Lahaie S, Pennock DM, Watts DJ. 2010. Predicting consumer behavior with web search. PNAS 107:4117486–90 [Google Scholar]
  53. Green AG, Gutmann MP. 2007. Building partnerships among social science researchers, institution-based repositories and domain specific data archives. OCLC Syst. Serv. Int. Digit. Libr. Perspect. 23:135–53 [Google Scholar]
  54. Groves RM. 2011. Three eras of survey research. Public Opin. Q. 75:5861–71 [Google Scholar]
  55. Guzman G. 2011. Internet search behavior as an economic forecasting tool: the case of inflation expectations. J. Econ. Soc. Meas. 36:3119–67 [Google Scholar]
  56. Hazen BT, Boone CA, Ezell JD, Jones-Farmer LA. 2014. Data quality for data science, predictive analytics, and big data in supply chain management: an introduction to the problem and suggestions for research and applications. Int. J. Prod. Econ. 154:72–80 [Google Scholar]
  57. Ioannidis JP. 2005. Why most published research findings are false. PLOS Med 2:8e124 [Google Scholar]
  58. ISO (Int. Stan. Organ.) 1992. ISO 9000—Quality Management. Geneva: ISO http://www.iso.org/iso/iso_9000
  59. Iwig W, Berning M, Marck P, Prell M. 2013. Data quality assessment tool for administrative data Work. Pap. WP 46, Fed. Comm. Stat. Methodol. [Google Scholar]
  60. Jager LR, Leek JT. 2013. An estimate of the science-wise false discovery rate and application to the top medical literature. Biostatistics doi: 10.1093/biostatistics/kxt007 [Google Scholar]
  61. Japec L, Kreuter F, Berg M, Biemer P, Decker P. et al. 2015. Big data in survey research: AAPOR task force report. Public Opin. Q. 79:839–80 [Google Scholar]
  62. Juran JM. 1951. Directions for ASQC. Ind. Qual. Control 8:330–34 [Google Scholar]
  63. Juran JM. 1964. Managerial Breakthrough: A New Concept of the Manager's Job. New York: McGraw-Hill
  64. Juran JM, Godfrey AB. 1999. Juran's Quality Handbook New York: McGraw-Hill [Google Scholar]
  65. Kamaliha E, Riahi F, Qazvinian V, Adibi J. 2008. Characterizing network motifs to identify spam comments. Proc. 2008 IEEE Int. Conf. Data Min. Worksh. (ICDMW)919–28 Piscataway, NJ: IEEE [Google Scholar]
  66. Karr AF, Sanil AP, Banks DL. 2006. Data quality: a statistical perspective. Stat. Methodol. 3:2137–73 [Google Scholar]
  67. Keller M, Schimel DS, Hargrove WW, Hoffman FM. 2008. A continental strategy for the national ecological observatory network. Front. Ecol. Environ. 6:5282–84 [Google Scholar]
  68. Keller S, Shipp S. 2017. Building resilient cities: harnessing the power of urban analytics. The Resilience Challenge: Looking at Resilience Through Multiple Lenses J Bohland and P Knox, forthcoming Springfield, IL: Charles C Thomas Ltd [Google Scholar]
  69. King K, Petroni R, Singh R. 1998. Quality profile for the survey of income and program participation Work. Pap. 230. US Census Bur. https://www.census.gov/sipp/workpapr/wp30.pdf [Google Scholar]
  70. Kolari P, Java A, Finin T, Oates T, Joshi A. 2006. Detecting spam blogs: A machine learning approach. Proc. Natl. Conf. Artif. Intelligence 21351–56 Palo Alto, CA: AAAI [Google Scholar]
  71. Korkmaz G, Cadena J, Kuhlman CJ, Marathe A, Vullikanti A, Ramakrishnan N. 2016. Multi-source models for civil unrest forecasting. Soc. Netw. Anal. Min. 6:50 [Google Scholar]
  72. Lampos V, De Bie T, Cristianini N. 2010. Flu detector—tracking epidemics on Twitter. Machine Learning and Knowledge Discovery in Databases W Daelemans, K Morik 599–602 New York: Springer [Google Scholar]
  73. Lee G, Allen B. 2001. Educated Use of Information about Data Quality Belconnen, Aust.: Aust. Bur. Stat. [Google Scholar]
  74. Lee YW, Strong DM, Kahn BK, Wang RY. 2002. AIMQ: a methodology for information quality assessment. Inf. Manag. 40:2133–46 [Google Scholar]
  75. LeVeque RJ, Mitchell IM, Stodden V. 2012. Reproducible research for scientific computing: tools and strategies for changing the culture. Comput. Sci. Eng. 14:413 [Google Scholar]
  76. Levitt SD, List JA. 2007. What do laboratory experiments measuring social preferences reveal about the real world?. J. Econ. Perspect. 21:153–74 [Google Scholar]
  77. Levitt SD, List JA. 2009. Field experiments in economics: the past, the present, and the future. Eur. Econ. Rev. 53:11–18 [Google Scholar]
  78. Liaw ST, Rahimi A, Ray P, Taggart J, Dennis S. et al. 2013. Towards an ontology for data quality in integrated chronic disease management: a realist review of the literature. Int. J. Med. Inform 8210–24 [Google Scholar]
  79. Lima LFR, Macada ACG, Koufteros X. 2007. A model for information quality in the banking industry–the case of the public banks in Brazil. Proc. 2007 Int. Conf. Inf. Qual. [Google Scholar]
  80. Lin YR, Sundaram H, Chi Y, Tatemura J, Tseng BL. 2008. Detecting splogs via temporal dynamics using self-similarity analysis. ACM Trans. Web (TWEB) 2:14 [Google Scholar]
  81. Madigan D, Wasserstein R. 2014. Statistics and science: a report of the London workshop on the future of the statistical sciences Lond. Worksh. Future Stat. Sci. http://www.worldofstatistics.org/wos/pdfs/Statistics&Science-TheLondonWorkshopReport.pdf [Google Scholar]
  82. Mandal P. 2004. Data quality in statistical process control. Total Q. Manag. Bus. Excell. 15:189–103 [Google Scholar]
  83. Manski CF. 2015. Communicating uncertainty in official economic statistics: an appraisal fifty years after Morgenstern. J. Econ. Lit. 53:3631–53 [Google Scholar]
  84. McNutt M. 2014. Journals unite for reproducibility. Science 346:6210679–79 [Google Scholar]
  85. Meyer BD. 1995. Natural and quasi-experiments in economics. J. Bus. Econ. Stat. 13:2151–61 [Google Scholar]
  86. Milham MP. 2012. Open neuroscience solutions for the connectome-wide association era. Neuron 73:2214–18 [Google Scholar]
  87. Morgenstern O. 1963. On the Accuracy of Economic Observations Princeton, NJ: Princeton Univ. Press [Google Scholar]
  88. Mosley M, Brackett MH, Earley S, Henderson D. 2010. The DAMA Guide to the Data Management Body of Knowledge Bradley Beach, NJ: Technics [Google Scholar]
  89. NRC (Natl. Res. Counc.) 2012. Using Science as Evidence in Public Policy Washington, DC: Natl. Acad. Press [Google Scholar]
  90. NRC (Natl. Res. Counc.) 2013a. Frontiers in Massive Data Analysis Washington, DC: Natl. Acad. Press [Google Scholar]
  91. NRC (Natl. Res. Counc.) 2013b. Principles and Practices for a Federal Statistical Agency Washington, DC: Natl. Acad. Press [Google Scholar]
  92. NRC (Natl. Res. Counc.) 2014. Furthering America's Research Enterprise Washington, DC: Natl. Acad. Press [Google Scholar]
  93. Neave HR. 2000. The Deming dimension: management for a better future. The Collection of the English Papers in the December 2006 Revision of the Deming Homepage69–78 Zumikon, Switzerland: Swiss Deming Institute. https://www.skgep.gov.ae/docs/default-source/Articles/article2.pdf#page=69 [Google Scholar]
  94. Norwood JL. 1990. Distinguished lecture on economics in government: Data quality and public policy. J. Econ. Perspect. 4:3–12 [Google Scholar]
  95. Nosek BA, Aarts AA, Anderson JE, Anderson CJ, Attridge PR. et al. 2015. Estimating the reproducibility of psychological science. Science 349:6251aac4716–aac4716 http://science.sciencemag.org/content/sci/349/6251/aac4716.full.pdf [Google Scholar]
  96. Ntoulas A, Najork M, Manasse M, Fetterly D. 2006. Detecting spam web pages through content analysis. Proc. 15th Int. Conf. World Wide Web83–92 New York: ACM [Google Scholar]
  97. O'Brien JF, Bodenheimer RE Jr., Brostow GJ, Hodgins JK. 1999. Automatic joint parameter estimation from magnetic motion capture data Tech. Rep., Georgia Inst. Technol., Atlanta, GA [Google Scholar]
  98. OMB (Off. Manag. Budg.) 2016. Building the Capacity to Produce and Use Evidence. Analytical and Perspectives, Budget of the U.S. Government, Fiscal Year 201769–77 Washington, DC: Off. Manag. Budg. [Google Scholar]
  99. O'Neil C. 2016. The ethical data scientist: people have too much trust in numbers to be intrinsically objective. Slate Feb. 4. http://www.slate.com/articles/technology/future_tense/2016/02/how_to_bring_better_ethics_to_data_science.html [Google Scholar]
  100. Orr LL. 1999. Social Experiments: Evaluating Public Programs with Experimental Methods Thousand Oaks, CA: Sage [Google Scholar]
  101. Ossen SJ, Daas PJ, Tennekes M. 2011. Overall assessment of the quality of administrative data sources. Proc. 58th World Statistical Congress, 2011, Dublin The Hague, Neth.: Int. Stat. Inst. [Google Scholar]
  102. Peng RD. 2009. Reproducible research and biostatistics. Biostatistics 10:3405–8 [Google Scholar]
  103. Petrakos M, Santourian A, Farmakis G, Stavropoulos P, Oikonomopoulou G. et al. 2014. Analysis of the potential of selected big data repositories as data sources for official statistics. Proc. 27th Panhellenic Stat. Conf. Athens: Greek Stat. Inst. [Google Scholar]
  104. Ramakrishnan N, Butler P, Muthiah S, Self N, Khandpur R. et al. 2014. ‘Beating the news’ with EMBERS: Forecasting civil unrest using open source indicators. Proc. 20th ACM SIGKDD Int. Conf. Knowl. Discov. Data Min1799–808 New York: ACM [Google Scholar]
  105. Redman TC. 1992. Data Quality: Management and Technology New York: Bantam Books [Google Scholar]
  106. Redman TC. 1998. The impact of poor data quality on the typical enterprise. Commun. ACM 41:279–82 [Google Scholar]
  107. Redman TC. 2001. Data Quality: The Field Guide Boston: Digital Press [Google Scholar]
  108. Redman TC, Blanton A. 1996. Data Quality for the Information Age Norwood, MA: Artech House, Inc. [Google Scholar]
  109. Redman TC. 2004. Data: an unfolding quality disaster. DM Rev 14:821–23 [Google Scholar]
  110. Reilly NB. 1994. Quality: What Makes It Happen? New York: John Wiley and Sons [Google Scholar]
  111. Ren GJ, Glissmann S. 2012. Identifying information assets for open data: the role of business architecture and information quality. Proc. 2012 IEEE 14th Int. Conf. Commer. Enterp. Comput. (CEC), pp. 94–100 Piscataway, NJ: IEEE [Google Scholar]
  112. Richesson RL, Hammond WE, Nahm M, Wixted D, Simon GE. et al. 2013. Electronic health records based phenotyping in next-generation clinical trials: A perspective from the NIH Health Care Systems Collaboratory. J. Am. Med. Inform. Assoc 20e226–31 [Google Scholar]
  113. Rosenbaum PR. 2010. Design of Observational Studies New York: Springer [Google Scholar]
  114. Rosenzweig MR, Wolpin KI. 2000. Natural “natural experiments” in economics. J. Econ. Lit. 38:827–74 [Google Scholar]
  115. Sakaki T, Okazaki M, Matsuo Y. 2010. Earthquake shakes Twitter users: real-time event detection by social sensors. Proc. 19th Int. Conf. World Wide Web851–60 New York: ACM [Google Scholar]
  116. Sloan Digital Sky Survey (SDSS) 2008. SDSS-III: massive spectroscopic surveys of the distant universe, the Milky Way galaxy, and extra-solar planetary systems. SDSS Jan. 8. https://www.sdss3.org/collaboration/description.pdf [Google Scholar]
  117. SN-MIAD 2013. Methodologies for integrated use of administrative data in the statistical process (MIAD) Stat. Netw. Tech. Rep., UNECE, Geneva. https://ec.europa.eu/eurostat/cros/system/files/Preliminary%20report%20on%20Quality%20Assessment%20Framework_0.pdf [Google Scholar]
  118. Soc. Clin. Data Manag. 2014. eSource implementation in clinical research: a data management perspective White Pap., Soc. Clin. Data Manag., McLean, VA. http://www.clinicalink.com/wp-content/uploads/2014/06/SCDM-eSource-Implementation_061214.pdf [Google Scholar]
  119. Spencer BD. 1985. Optimal data quality. J. Am. Stat. Assoc. 80:391564–73 [Google Scholar]
  120. Statistics Canada 2009. Statistics Canada Quality Guidelines Ottowa, Can.: Stat. Can, 5th Ed.. [Google Scholar]
  121. Statistics Netherlands 2012. 49 Factors that Influence the Quality of Secondary Data Sources The Hague, Neth.: Stat. Neth. [Google Scholar]
  122. Stodden V. 2015. Reproducing statistical results. Annu. Rev. Stat. Appl. 2:1–19 [Google Scholar]
  123. Stodden V, Borwein J, Bailey DH. 2013. Setting the default to reproducible. Comput. Sci. Res. SIAM News 46:4–6 [Google Scholar]
  124. Strong DM, Lee YW, Wang RY. 1997. Data quality in context. Commun. ACM 40:5103–10 [Google Scholar]
  125. Stvilia B, Hinnant CC, Wu S, Worrall A, Lee DJ. et al. 2015. Research project tasks, data, and perceptions of data quality in a condensed matter physics community. J. Assoc. Inform. Sci. Technol. 66:2246–63 [Google Scholar]
  126. Taguchi G. 1992. Introduction to Quality Engineering: Designing Quality into Products and Processes Tokyo: Asian Product. Organ. [Google Scholar]
  127. Tam S, Clarke F. 2015. Big data, statistical inference and official statistics Res. pap. 1351.0.55.054, Aust. Bur. Stat., Canberra, Aust. [Google Scholar]
  128. Tayi GK, Ballou DP. 1998. Examining data quality. Commun. ACM 41:254–57 [Google Scholar]
  129. Tukey JW. 1962. The future of data analysis. Ann. Math. Stat. 33:1–67 [Google Scholar]
  130. Tukey JW. 1977. Exploratory Data Analysis New York: Pearson [Google Scholar]
  131. Tumasjan A, Sprenger TO, Sandner PG, Welpe IM. 2010. Predicting elections with Twitter: what 140 characters reveal about political sentiment. Proc. 4th Int. AAAI Conf. Weblogs Soc. Media, ICWSM 2010, Washington, DC, USA, May 23–26, 2010 [Google Scholar]
  132. UK ONS (UK Off. Natl. Stat.) 2013. Guidelines for measuring statistical output quality, version 4.1 London: ONS http://webarchive.nationalarchives.gov.uk/20160105160709/http://www.ons.gov.uk/ons/guide-method/method-quality/quality/guidelines-for-measuring-statistical-quality/index.html [Google Scholar]
  133. UNECE (UN Econ. Comm. Eur.) 2013. The generic statistical business process model (GSBPM). Version 5.0. http://www1.unece.org/stat/platform/display/GSBPM/GSBPM+v5.0
  134. UNECE (UN Econ. Comm. Eur.) 2015. Using Administrative and Secondary Sources for Official Statistics: A Handbook of Principles and Practices Geneva: UNECE [Google Scholar]
  135. US Census Bur. 2015. Review of administrative data sources relevant to the American Community Survey Work. Pap US Dep. Commer Washington, DC: [Google Scholar]
  136. Verschaeren F. 2012. Checking the usefulness and initial quality of administrative data Presented at Meet. Am. Stat. Assoc. (ASA), 4th Int. Conf. Establishment Surv. http://www.q2012.gr/articlefiles/sessions/20.2_Verschaeren_ESSnet%20Admin%20data.pdf [Google Scholar]
  137. Wang RY. 1998. A product perspective on total data quality management. Commun. ACM 41:258–65 [Google Scholar]
  138. Wang RY, Reddy MP, Kon HB. 1995. Toward quality data: An attribute-based approach. Decis. Support Syst. 13:3349–72 [Google Scholar]
  139. Wang RY, Strong DM. 1996. Beyond accuracy: What data quality means to data consumers. J. Manag. Inform. Syst. 12:5–33 [Google Scholar]
  140. Wasserstein RL, Lazar NA. 2016. The ASA's statement on p-values: context, process, and purpose. Am. Stat. 70:129–33 [Google Scholar]
  141. Weiskopf NG, Weng C. 2013. Methods and dimensions of electronic health record data quality assessment: enabling reuse for clinical research. J. Am. Med. Inform. Assoc 20144–51 [Google Scholar]
  142. Wickham H. 2014. Tidy data. J. Stat. Softw. 59:101–23 [Google Scholar]
  143. Williams RH, Zimmerman DW, Ross DC, Zumbo BD. 2006. Twelve British Statisticians Raleigh, NC: Boson [Google Scholar]
  144. Wu L, Brynjolfsson E. 2009. The future of prediction: how Google searches foreshadow housing prices and quantities. ICIS 2009 Proceedingspaper 147 [Google Scholar]
  145. Wu S. 2013. A review on coarse warranty data and analysis. Reliability Eng. Syst. Saf. 114:1–11 [Google Scholar]
  146. Zhang X, Fuehres H, Gloor PA. 2011. Predicting stock market indicators through Twitter “I hope it is not as bad as I fear.”. Procedia Soc. Behav. Sci. 26:55–62 [Google Scholar]
  147. Zhu L, Sun A, Choi B. 2008. Online spam-blog detection through blog search. Proc. 17th ACM Conf. Inform. Knowl. Manag1347–48 New York: ACM [Google Scholar]
/content/journals/10.1146/annurev-statistics-060116-054114
Loading
/content/journals/10.1146/annurev-statistics-060116-054114
Loading

Data & Media loading...

  • Article Type: Review Article
This is a required field
Please enter a valid email address
Approval was a Success
Invalid data
An Error Occurred
Approval was partially successful, following selected items could not be processed due to error