1932

Abstract

Data, and hence data quality, transcend all boundaries of science, commerce, engineering, medicine, public health, and policy. Data quality has historically been addressed by controlling the measurement processes, controlling the data collection processes, and through data ownership. For many data sources being leveraged into data science, this approach to data quality may be challenged. To understand that challenge, a historical and disciplinary perspective on data quality, highlighting the evolution and convergence of data concepts and applications, is presented.

Loading

Article metrics loading...

/content/journals/10.1146/annurev-statistics-060116-054114
2017-03-07
2024-03-29
Loading full text...

Full text loading...

/deliver/fulltext/statistics/4/1/annurev-statistics-060116-054114.html?itemId=/content/journals/10.1146/annurev-statistics-060116-054114&mimeType=html&fmt=ahah

Literature Cited

  1. Abate M, Diegert K, Allen H. 1998. A hierarchical approach to improving data quality. Data Qual 4:1365–69 [Google Scholar]
  2. Agafitei M, Gras F, Kloek W, Reis F. et al. 2015. Measuring output quality for multisource statistics in official statistics: some directions. Stat. J. IAOS 31:2203–11 [Google Scholar]
  3. Agarwal N, Yiliyasi Y. 2010. Information quality challenges in social media. Int. Conf. Inform. Q. (ICIQ). http://mitiq.mit.edu/ICIQ/Documents/IQ%20Conference%202010/Papers/3A1_IQChallengesInSocialMedia.pdf
  4. Arts DGT, De Keizer NF, Scheffer G. 2002. Defining and improving data quality in medical registries: a literature review, case study, and generic framework. J. Am. Med. Inform. Assoc 96600–11 [Google Scholar]
  5. Asur S, Huberman BA. 2010. Predicting the future with social media. 2010 IEEE/WIC/ACM Int. Conf. Web Intell. Intell. Agent Technol. (WI-IAT) 1492–99 Piscataway, NJ: IEEE [Google Scholar]
  6. Aust. Bur. Stat. 2009. The ABS Data Quality Framework Belconnen, Aust.: Aust. Bur. Stat http://www.abs.gov.au/ausstats/[email protected]/mf/1520.0
  7. Baker M. 2016. Statisticians issue warning over misuse of p-values. Nat. News 531:151 [Google Scholar]
  8. Ballou D, Wang R, Pazer H, Tayi GK. 1998. Modeling information manufacturing systems to determine information product quality. Manag. Sci. 44:4462–84 [Google Scholar]
  9. Batini C, Cappiello C, Francalanci C, Maurino A. 2009. Methodologies for data quality assessment and improvement. ACM Comput. Surv. 41:316 [Google Scholar]
  10. Batini C, Scannapieco M. 2006. Introduction to data quality. Data Quality: Concepts, Methodologies and Techniques C Batini, M Scannapieco 1–18 New York: Springer [Google Scholar]
  11. Becker KG. 2001. The sharing of canal microarray data. Nat. Rev. Neurosci. 2:6438–40 [Google Scholar]
  12. Behn R. 2015. The black box of randomized controlled trials. Bob Behn's Perform. Leadersh. Rep. 12:51 [Google Scholar]
  13. Biemer PP. 2010. Total survey error: design, implementation, and evaluation. Public Opin. Q. 74:5817–48 [Google Scholar]
  14. Biemer PP, Lyberg LE. 2003. Introduction to Survey Quality New York: Wiley
  15. Biemer P, Trewin D, Bergdahl H, Japec L. 2014. A system for managing the quality of official statistics. J. Off. Stat. 30:3381–415 [Google Scholar]
  16. Bollen J, Mao H, Zeng X. 2011. Twitter mood predicts the stock market. J. Comput. Sci. 2:11–8 [Google Scholar]
  17. Boritz JE. 2005. IS practitioners’ views on core concepts of information integrity. Int. J. Account. Inform. Syst. 6:4260–79 [Google Scholar]
  18. Braaksma B, Zeelenberg K. 2015. Re-make/re-model: should big data change the modelling paradigm in official statistics?. Stat. J. Int. Assoc. Off. Stat. 31:2193–202 [Google Scholar]
  19. Brackstone G. 1999. Managing data quality in a statistical agency. Surv. Methodol. 25:2139–50 [Google Scholar]
  20. Brooks CA, Bailer BA. 1978. An error profile: employment as measured by the current population survey Work. Pap. 3, Off. Fed. Stat. Policy Stand.
  21. Cabitza F, Batini C. 2016. Information quality in healthcare. Data and Information Quality: Dimensions, Principles and Techniques C Batini, M Scannapieco 21–51 London: Springer [Google Scholar]
  22. Cavallo A. 2015. Scraped data and sticky prices Work. Pap. 21490, Natl. Bur. Econ. Res.
  23. Chapman AD. 2005. Principles of data quality Rep., Glob. Biodivers. Inf. Facil., Copenhagen
  24. Chen F, Neill DB. 2014. Non-parametric scan statistics for event detection and fore-casting in heterogeneous social media graphs. Proc. 20th ACM SIGKDD Int. Conf. Knowl. Discov. Data Min.1166–75 New York: ACM [Google Scholar]
  25. Choi H, Varian H. 2012. Predicting the present with Google trends. Econ. Rec. 88:2–9 [Google Scholar]
  26. Contreras JL, Reichman JH. 2015. Sharing by design: data and decentralized commons. Science 350:62661312–14 [Google Scholar]
  27. Cook TD, Campbell DT, Day A. 1979. Quasi-experimentation: Design Analysis Issues for Field Settings Boston: Houghton Mifflin
  28. Couper M. 2013. Is the sky falling? New technology, changing media, and the future of surveys. Surv. Res. Methods 7:145–56 [Google Scholar]
  29. Culotta A. 2010. Towards detecting influenza epidemics by analyzing Twitter messages. Proc. 1st Worksh. Soc. Media Anal.115–122 New York: ACM [Google Scholar]
  30. Daas P, Roos M, Van de Ven M, Neroni J. 2012. Twitter as a potential data source for statistics Work. Pap. 201221, Cent. Bur. Stat . http://www.pietdaas.nl/beta/pubs/pubs/DiscPaper_Twitter.pdf
  31. Deaton A. 2010. Instruments, randomization, and learning about development. J. Econ. Lit. 48:424–55 [Google Scholar]
  32. Deaton A, Cartwright N. 2016. Understanding and misunderstanding randomized controlled trials NBER Work. Pap. 22595. http://www.princeton.edu/∼deaton/downloads/Deaton_Cartwright_RCTs_with_ABSTRACT_August_25.pdf
  33. Deming WE. 1950. Lectures on Statistical Control of Quality. Tokyo: Nippon Kagaku Gijutsu Remmei
  34. Deming WE. 1993. The New Economics for Industry, Government, Education Cambridge, MA: MIT Press
  35. Deming WE, Geoffrey L. 1941. On sample inspection in the processing of census returns. J. Am. Stat. Assoc. 36:215351–60 [Google Scholar]
  36. Dippo CS. 1997. Survey Measurement and Process Improvement: Concepts and Integration Hoboken, NJ: Wiley
  37. Donoho D. 2015. 50 years of data science Presented at Tukey Centen Worksh Princeton, NJ:September 18. http://courses.csail.mit.edu/18.337/2015/docs/50YearsDataScience.pdf
  38. Dunning T. 2012. Natural Experiments in the Social Sciences: A Design-based Approach Cambridge, UK: Cambridge Univ. Press
  39. Emamjome F. 2014. A theoretical approach to conceptualize information quality in social media. Proc. 25th Australas. Conf. Inf. Syst., Auckland, NZ. http://www.pacis-net.org/file/2013/PACIS2013-072.pdf
  40. Emamjome FF, Rabaa'i AA, Gable GG, Bandara W. 2013. Information quality in social media: A conceptual model. Proc. Pac. Asia Conf. Inf. Syst. (PACIS 2013) http://www.pacis-net.org/file/2013/PACIS2013-072.pdf
  41. EPA (Environ. Prot. Agency) 2000. Guidance for data quality assessment: practical methods for data analysis. Tech. Rep. EPA QA/G-9, Environ. Prot. Agency, Washington, DC [Google Scholar]
  42. EPA (Environ. Prot. Agency) 2006. Data quality assessment: statistical methods for practitioners Tech. Rep. EPA QA/G-9S, Environ. Prot. Agency, Washington, DC
  43. Ettredge M, Gerdes J, Karuga G. 2005. Using web-based search data to predict macroeconomic statistics. Commun. ACM 48:1187–92 [Google Scholar]
  44. ESS (Eur. Stat. Syst.) 2015. Quality assurance framework of the European Statistical System, version 1.2. http://ec.europa.eu/eurostat/documents/64157/4392716/ESS-QAF-V1-2final.pdf/bbf5970c-1adf-46c8-afc3-58ce177a0646
  45. FDA (Food Drug Admin.) 2013. Guidance for Industry: Electronic Source Data in Clinical Investigations Washington, DC: Dep. Health Hum. Serv http://www.fda.gov/downloads/drugs/guidancecomplianceregulatoryinformation/guidances/ucm328691.pdf
  46. Fisher C, Lauría E, Chengalur-Smith S, Wang R. 2012. Introduction to Information Quality Bloomington, IN: AuthorHouse
  47. Fisher RA. 1925. Theory of statistical estimation. Mathematical Proceedings of the Cambridge Philosophical Society 22700–25 Cambridge, UK: Cambridge Univ. Press [Google Scholar]
  48. Friedman LM, Furberg CD, DeMets DL, Reboussin DM, Granger CB. 2015. Fundamentals of Clinical Trials. New York: Springer. , 5th ed..
  49. Ge M, Helfert M. 2007. A review of information quality research—develop a research agenda. Proc. 12th Int. Conf. Inf. Qual., Cambridge, MA, Nov. 9–11. http://mitiq.mit.edu/ICIQ/PDF/A%20REVIEW%20OF%20INFORMATION%20QUALITY%20RESEARCH.pdf
  50. George SL, Buyse M. 2015. Data fraud in clinical trials. Clin. Investig. 5:2161–73 [Google Scholar]
  51. Gliklich R, Dreyer N, Leavy M. 2014. Registries for Evaluating Patient Outcomes: A User's Guide Rockville, MD: Agency Healthc. Res. Qual, 3rd ed..
  52. Goel S, Hofman JM, Lahaie S, Pennock DM, Watts DJ. 2010. Predicting consumer behavior with web search. PNAS 107:4117486–90 [Google Scholar]
  53. Green AG, Gutmann MP. 2007. Building partnerships among social science researchers, institution-based repositories and domain specific data archives. OCLC Syst. Serv. Int. Digit. Libr. Perspect. 23:135–53 [Google Scholar]
  54. Groves RM. 2011. Three eras of survey research. Public Opin. Q. 75:5861–71 [Google Scholar]
  55. Guzman G. 2011. Internet search behavior as an economic forecasting tool: the case of inflation expectations. J. Econ. Soc. Meas. 36:3119–67 [Google Scholar]
  56. Hazen BT, Boone CA, Ezell JD, Jones-Farmer LA. 2014. Data quality for data science, predictive analytics, and big data in supply chain management: an introduction to the problem and suggestions for research and applications. Int. J. Prod. Econ. 154:72–80 [Google Scholar]
  57. Ioannidis JP. 2005. Why most published research findings are false. PLOS Med 2:8e124 [Google Scholar]
  58. ISO (Int. Stan. Organ.) 1992. ISO 9000—Quality Management. Geneva: ISO http://www.iso.org/iso/iso_9000
  59. Iwig W, Berning M, Marck P, Prell M. 2013. Data quality assessment tool for administrative data Work. Pap. WP 46, Fed. Comm. Stat. Methodol.
  60. Jager LR, Leek JT. 2013. An estimate of the science-wise false discovery rate and application to the top medical literature. Biostatistics doi: 10.1093/biostatistics/kxt007
  61. Japec L, Kreuter F, Berg M, Biemer P, Decker P. et al. 2015. Big data in survey research: AAPOR task force report. Public Opin. Q. 79:839–80 [Google Scholar]
  62. Juran JM. 1951. Directions for ASQC. Ind. Qual. Control 8:330–34 [Google Scholar]
  63. Juran JM. 1964. Managerial Breakthrough: A New Concept of the Manager's Job. New York: McGraw-Hill
  64. Juran JM, Godfrey AB. 1999. Juran's Quality Handbook New York: McGraw-Hill
  65. Kamaliha E, Riahi F, Qazvinian V, Adibi J. 2008. Characterizing network motifs to identify spam comments. Proc. 2008 IEEE Int. Conf. Data Min. Worksh. (ICDMW)919–28 Piscataway, NJ: IEEE [Google Scholar]
  66. Karr AF, Sanil AP, Banks DL. 2006. Data quality: a statistical perspective. Stat. Methodol. 3:2137–73 [Google Scholar]
  67. Keller M, Schimel DS, Hargrove WW, Hoffman FM. 2008. A continental strategy for the national ecological observatory network. Front. Ecol. Environ. 6:5282–84 [Google Scholar]
  68. Keller S, Shipp S. 2017. Building resilient cities: harnessing the power of urban analytics. The Resilience Challenge: Looking at Resilience Through Multiple Lenses J Bohland and P Knox, forthcoming Springfield, IL: Charles C Thomas Ltd [Google Scholar]
  69. King K, Petroni R, Singh R. 1998. Quality profile for the survey of income and program participation Work. Pap. 230. US Census Bur. https://www.census.gov/sipp/workpapr/wp30.pdf
  70. Kolari P, Java A, Finin T, Oates T, Joshi A. 2006. Detecting spam blogs: A machine learning approach. Proc. Natl. Conf. Artif. Intelligence 21351–56 Palo Alto, CA: AAAI [Google Scholar]
  71. Korkmaz G, Cadena J, Kuhlman CJ, Marathe A, Vullikanti A, Ramakrishnan N. 2016. Multi-source models for civil unrest forecasting. Soc. Netw. Anal. Min. 6:50 [Google Scholar]
  72. Lampos V, De Bie T, Cristianini N. 2010. Flu detector—tracking epidemics on Twitter. Machine Learning and Knowledge Discovery in Databases W Daelemans, K Morik 599–602 New York: Springer [Google Scholar]
  73. Lee G, Allen B. 2001. Educated Use of Information about Data Quality Belconnen, Aust.: Aust. Bur. Stat.
  74. Lee YW, Strong DM, Kahn BK, Wang RY. 2002. AIMQ: a methodology for information quality assessment. Inf. Manag. 40:2133–46 [Google Scholar]
  75. LeVeque RJ, Mitchell IM, Stodden V. 2012. Reproducible research for scientific computing: tools and strategies for changing the culture. Comput. Sci. Eng. 14:413 [Google Scholar]
  76. Levitt SD, List JA. 2007. What do laboratory experiments measuring social preferences reveal about the real world?. J. Econ. Perspect. 21:153–74 [Google Scholar]
  77. Levitt SD, List JA. 2009. Field experiments in economics: the past, the present, and the future. Eur. Econ. Rev. 53:11–18 [Google Scholar]
  78. Liaw ST, Rahimi A, Ray P, Taggart J, Dennis S. et al. 2013. Towards an ontology for data quality in integrated chronic disease management: a realist review of the literature. Int. J. Med. Inform 8210–24 [Google Scholar]
  79. Lima LFR, Macada ACG, Koufteros X. 2007. A model for information quality in the banking industry–the case of the public banks in Brazil. Proc. 2007 Int. Conf. Inf. Qual.
  80. Lin YR, Sundaram H, Chi Y, Tatemura J, Tseng BL. 2008. Detecting splogs via temporal dynamics using self-similarity analysis. ACM Trans. Web (TWEB) 2:14 [Google Scholar]
  81. Madigan D, Wasserstein R. 2014. Statistics and science: a report of the London workshop on the future of the statistical sciences Lond. Worksh. Future Stat. Sci. http://www.worldofstatistics.org/wos/pdfs/Statistics&Science-TheLondonWorkshopReport.pdf
  82. Mandal P. 2004. Data quality in statistical process control. Total Q. Manag. Bus. Excell. 15:189–103 [Google Scholar]
  83. Manski CF. 2015. Communicating uncertainty in official economic statistics: an appraisal fifty years after Morgenstern. J. Econ. Lit. 53:3631–53 [Google Scholar]
  84. McNutt M. 2014. Journals unite for reproducibility. Science 346:6210679–79 [Google Scholar]
  85. Meyer BD. 1995. Natural and quasi-experiments in economics. J. Bus. Econ. Stat. 13:2151–61 [Google Scholar]
  86. Milham MP. 2012. Open neuroscience solutions for the connectome-wide association era. Neuron 73:2214–18 [Google Scholar]
  87. Morgenstern O. 1963. On the Accuracy of Economic Observations Princeton, NJ: Princeton Univ. Press
  88. Mosley M, Brackett MH, Earley S, Henderson D. 2010. The DAMA Guide to the Data Management Body of Knowledge Bradley Beach, NJ: Technics
  89. NRC (Natl. Res. Counc.) 2012. Using Science as Evidence in Public Policy Washington, DC: Natl. Acad. Press
  90. NRC (Natl. Res. Counc.) 2013a. Frontiers in Massive Data Analysis Washington, DC: Natl. Acad. Press
  91. NRC (Natl. Res. Counc.) 2013b. Principles and Practices for a Federal Statistical Agency Washington, DC: Natl. Acad. Press
  92. NRC (Natl. Res. Counc.) 2014. Furthering America's Research Enterprise Washington, DC: Natl. Acad. Press
  93. Neave HR. 2000. The Deming dimension: management for a better future. The Collection of the English Papers in the December 2006 Revision of the Deming Homepage69–78 Zumikon, Switzerland: Swiss Deming Institute. https://www.skgep.gov.ae/docs/default-source/Articles/article2.pdf#page=69 [Google Scholar]
  94. Norwood JL. 1990. Distinguished lecture on economics in government: Data quality and public policy. J. Econ. Perspect. 4:3–12 [Google Scholar]
  95. Nosek BA, Aarts AA, Anderson JE, Anderson CJ, Attridge PR. et al. 2015. Estimating the reproducibility of psychological science. Science 349:6251aac4716–aac4716 http://science.sciencemag.org/content/sci/349/6251/aac4716.full.pdf [Google Scholar]
  96. Ntoulas A, Najork M, Manasse M, Fetterly D. 2006. Detecting spam web pages through content analysis. Proc. 15th Int. Conf. World Wide Web83–92 New York: ACM [Google Scholar]
  97. O'Brien JF, Bodenheimer RE Jr., Brostow GJ, Hodgins JK. 1999. Automatic joint parameter estimation from magnetic motion capture data Tech. Rep., Georgia Inst. Technol., Atlanta, GA
  98. OMB (Off. Manag. Budg.) 2016. Building the Capacity to Produce and Use Evidence. Analytical and Perspectives, Budget of the U.S. Government, Fiscal Year 201769–77 Washington, DC: Off. Manag. Budg. [Google Scholar]
  99. O'Neil C. 2016. The ethical data scientist: people have too much trust in numbers to be intrinsically objective. Slate Feb. 4. http://www.slate.com/articles/technology/future_tense/2016/02/how_to_bring_better_ethics_to_data_science.html
  100. Orr LL. 1999. Social Experiments: Evaluating Public Programs with Experimental Methods Thousand Oaks, CA: Sage
  101. Ossen SJ, Daas PJ, Tennekes M. 2011. Overall assessment of the quality of administrative data sources. Proc. 58th World Statistical Congress, 2011, Dublin The Hague, Neth.: Int. Stat. Inst. [Google Scholar]
  102. Peng RD. 2009. Reproducible research and biostatistics. Biostatistics 10:3405–8 [Google Scholar]
  103. Petrakos M, Santourian A, Farmakis G, Stavropoulos P, Oikonomopoulou G. et al. 2014. Analysis of the potential of selected big data repositories as data sources for official statistics. Proc. 27th Panhellenic Stat. Conf. Athens: Greek Stat. Inst. [Google Scholar]
  104. Ramakrishnan N, Butler P, Muthiah S, Self N, Khandpur R. et al. 2014. ‘Beating the news’ with EMBERS: Forecasting civil unrest using open source indicators. Proc. 20th ACM SIGKDD Int. Conf. Knowl. Discov. Data Min1799–808 New York: ACM [Google Scholar]
  105. Redman TC. 1992. Data Quality: Management and Technology New York: Bantam Books
  106. Redman TC. 1998. The impact of poor data quality on the typical enterprise. Commun. ACM 41:279–82 [Google Scholar]
  107. Redman TC. 2001. Data Quality: The Field Guide Boston: Digital Press
  108. Redman TC, Blanton A. 1996. Data Quality for the Information Age Norwood, MA: Artech House, Inc.
  109. Redman TC. 2004. Data: an unfolding quality disaster. DM Rev 14:821–23 [Google Scholar]
  110. Reilly NB. 1994. Quality: What Makes It Happen? New York: John Wiley and Sons
  111. Ren GJ, Glissmann S. 2012. Identifying information assets for open data: the role of business architecture and information quality. Proc. 2012 IEEE 14th Int. Conf. Commer. Enterp. Comput. (CEC), pp. 94–100 Piscataway, NJ: IEEE [Google Scholar]
  112. Richesson RL, Hammond WE, Nahm M, Wixted D, Simon GE. et al. 2013. Electronic health records based phenotyping in next-generation clinical trials: A perspective from the NIH Health Care Systems Collaboratory. J. Am. Med. Inform. Assoc 20e226–31 [Google Scholar]
  113. Rosenbaum PR. 2010. Design of Observational Studies New York: Springer
  114. Rosenzweig MR, Wolpin KI. 2000. Natural “natural experiments” in economics. J. Econ. Lit. 38:827–74 [Google Scholar]
  115. Sakaki T, Okazaki M, Matsuo Y. 2010. Earthquake shakes Twitter users: real-time event detection by social sensors. Proc. 19th Int. Conf. World Wide Web851–60 New York: ACM [Google Scholar]
  116. Sloan Digital Sky Survey (SDSS) 2008. SDSS-III: massive spectroscopic surveys of the distant universe, the Milky Way galaxy, and extra-solar planetary systems. SDSS Jan. 8. https://www.sdss3.org/collaboration/description.pdf
  117. SN-MIAD 2013. Methodologies for integrated use of administrative data in the statistical process (MIAD) Stat. Netw. Tech. Rep., UNECE, Geneva. https://ec.europa.eu/eurostat/cros/system/files/Preliminary%20report%20on%20Quality%20Assessment%20Framework_0.pdf
  118. Soc. Clin. Data Manag. 2014. eSource implementation in clinical research: a data management perspective White Pap., Soc. Clin. Data Manag., McLean, VA. http://www.clinicalink.com/wp-content/uploads/2014/06/SCDM-eSource-Implementation_061214.pdf
  119. Spencer BD. 1985. Optimal data quality. J. Am. Stat. Assoc. 80:391564–73 [Google Scholar]
  120. Statistics Canada 2009. Statistics Canada Quality Guidelines Ottowa, Can.: Stat. Can, 5th Ed..
  121. Statistics Netherlands 2012. 49 Factors that Influence the Quality of Secondary Data Sources The Hague, Neth.: Stat. Neth.
  122. Stodden V. 2015. Reproducing statistical results. Annu. Rev. Stat. Appl. 2:1–19 [Google Scholar]
  123. Stodden V, Borwein J, Bailey DH. 2013. Setting the default to reproducible. Comput. Sci. Res. SIAM News 46:4–6 [Google Scholar]
  124. Strong DM, Lee YW, Wang RY. 1997. Data quality in context. Commun. ACM 40:5103–10 [Google Scholar]
  125. Stvilia B, Hinnant CC, Wu S, Worrall A, Lee DJ. et al. 2015. Research project tasks, data, and perceptions of data quality in a condensed matter physics community. J. Assoc. Inform. Sci. Technol. 66:2246–63 [Google Scholar]
  126. Taguchi G. 1992. Introduction to Quality Engineering: Designing Quality into Products and Processes Tokyo: Asian Product. Organ.
  127. Tam S, Clarke F. 2015. Big data, statistical inference and official statistics Res. pap. 1351.0.55.054, Aust. Bur. Stat., Canberra, Aust.
  128. Tayi GK, Ballou DP. 1998. Examining data quality. Commun. ACM 41:254–57 [Google Scholar]
  129. Tukey JW. 1962. The future of data analysis. Ann. Math. Stat. 33:1–67 [Google Scholar]
  130. Tukey JW. 1977. Exploratory Data Analysis New York: Pearson
  131. Tumasjan A, Sprenger TO, Sandner PG, Welpe IM. 2010. Predicting elections with Twitter: what 140 characters reveal about political sentiment. Proc. 4th Int. AAAI Conf. Weblogs Soc. Media, ICWSM 2010, Washington, DC, USA, May 23–26, 2010 [Google Scholar]
  132. UK ONS (UK Off. Natl. Stat.) 2013. Guidelines for measuring statistical output quality, version 4.1 London: ONS http://webarchive.nationalarchives.gov.uk/20160105160709/http://www.ons.gov.uk/ons/guide-method/method-quality/quality/guidelines-for-measuring-statistical-quality/index.html
  133. UNECE (UN Econ. Comm. Eur.) 2013. The generic statistical business process model (GSBPM). Version 5.0. http://www1.unece.org/stat/platform/display/GSBPM/GSBPM+v5.0
  134. UNECE (UN Econ. Comm. Eur.) 2015. Using Administrative and Secondary Sources for Official Statistics: A Handbook of Principles and Practices Geneva: UNECE
  135. US Census Bur. 2015. Review of administrative data sources relevant to the American Community Survey Work. Pap US Dep. Commer Washington, DC:
  136. Verschaeren F. 2012. Checking the usefulness and initial quality of administrative data Presented at Meet. Am. Stat. Assoc. (ASA), 4th Int. Conf. Establishment Surv. http://www.q2012.gr/articlefiles/sessions/20.2_Verschaeren_ESSnet%20Admin%20data.pdf
  137. Wang RY. 1998. A product perspective on total data quality management. Commun. ACM 41:258–65 [Google Scholar]
  138. Wang RY, Reddy MP, Kon HB. 1995. Toward quality data: An attribute-based approach. Decis. Support Syst. 13:3349–72 [Google Scholar]
  139. Wang RY, Strong DM. 1996. Beyond accuracy: What data quality means to data consumers. J. Manag. Inform. Syst. 12:5–33 [Google Scholar]
  140. Wasserstein RL, Lazar NA. 2016. The ASA's statement on p-values: context, process, and purpose. Am. Stat. 70:129–33 [Google Scholar]
  141. Weiskopf NG, Weng C. 2013. Methods and dimensions of electronic health record data quality assessment: enabling reuse for clinical research. J. Am. Med. Inform. Assoc 20144–51 [Google Scholar]
  142. Wickham H. 2014. Tidy data. J. Stat. Softw. 59:101–23 [Google Scholar]
  143. Williams RH, Zimmerman DW, Ross DC, Zumbo BD. 2006. Twelve British Statisticians Raleigh, NC: Boson
  144. Wu L, Brynjolfsson E. 2009. The future of prediction: how Google searches foreshadow housing prices and quantities. ICIS 2009 Proceedingspaper 147
  145. Wu S. 2013. A review on coarse warranty data and analysis. Reliability Eng. Syst. Saf. 114:1–11 [Google Scholar]
  146. Zhang X, Fuehres H, Gloor PA. 2011. Predicting stock market indicators through Twitter “I hope it is not as bad as I fear.”. Procedia Soc. Behav. Sci. 26:55–62 [Google Scholar]
  147. Zhu L, Sun A, Choi B. 2008. Online spam-blog detection through blog search. Proc. 17th ACM Conf. Inform. Knowl. Manag1347–48 New York: ACM [Google Scholar]
/content/journals/10.1146/annurev-statistics-060116-054114
Loading
/content/journals/10.1146/annurev-statistics-060116-054114
Loading

Data & Media loading...

  • Article Type: Review Article
This is a required field
Please enter a valid email address
Approval was a Success
Invalid data
An Error Occurred
Approval was partially successful, following selected items could not be processed due to error