1932

Abstract

In an era where external data and computational capabilities far exceed statistical agencies’ own resources and capabilities, they face the renewed challenge of protecting the confidentiality of underlying microdata when publishing statistics in very granular form and ensuring that these granular data are used for statistical purposes only. Conventional statistical disclosure limitation methods are too fragile to address this new challenge. This article discusses the deployment of a differential privacy framework for the 2020 US Census that was customized to protect confidentiality, particularly the most detailed geographic and demographic categories, and deliver controlled accuracy across the full geographic hierarchy.

Loading

Article metrics loading...

/content/journals/10.1146/annurev-statistics-010422-034226
2023-03-09
2024-04-30
Loading full text...

Full text loading...

/deliver/fulltext/statistics/10/1/annurev-statistics-010422-034226.html?itemId=/content/journals/10.1146/annurev-statistics-010422-034226&mimeType=html&fmt=ahah

Literature Cited

  1. Abowd J. 2017. How will statistical agencies operate when all data are private?. J. Priv. Conf. 7:3 https://doi.org/10.29012/jpc.v7i3.404
    [Google Scholar]
  2. Abowd J. 2018. The US Census Bureau adopts differential privacy. Invited lecture at the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining London: Aug. 19–23
  3. Abowd J. 2021a. Official statistics at the crossroads: data quality and access in an era of heightened privacy risk. Surv. Statist. 83:23–26
    [Google Scholar]
  4. Abowd J. 2021b. Second declaration of John M. Abowd, appendix B: 2010 reconstruction-abetted re-identification simulated attack. Fair Lines America Foundation, Inc. v. U.S. Department of Commerce https://www2.census.gov/about/policies/foia/records/disclosure-avoidance/appendix-b-summary-of-simulated-reconstruction-abetted-re-identification-attack.pdf
    [Google Scholar]
  5. Abowd J, Ashmead R, Cumings-Menon R, Garfinkel S, Heineck M et al. 2022. The 2020 Census Disclosure Avoidance system TopDown algorithm. Harv. Data Sci. Rev. 2022(Spec. Iss. 2) https://doi.org/10.1162/99608f92.529e3cb9
    [Google Scholar]
  6. Abowd J, Ashmead R, Cumings-Menon R, Garfinkel S, Kifer D et al. 2021b. An uncertainty principle is a price of privacy-preserving microdata. 35th Conference on Neural Information Processing Systems (NeurIPS 2021) M Ranzato, A Beygelzimer, Y Dauphin, PS Liang, J Wortman Vaughan 11883–95 Adv. Neur. Inf. Process. Syst. Red Hook, NY: Curran Assoc.
    [Google Scholar]
  7. Abowd J, Ashmead R, Cumings-Menon R, Kifer D, Leclerc P et al. 2021a. Geographic spines in the 2020 Census Disclosure Avoidance System TopDown Algorithm. arXiv:2203.16654 [cs.CR]
  8. Abowd J, Schmutte I 2015. Economic analysis and statistical disclosure limitation. Brookings Papers on Economic Activity: Spring 2015 DH Romer, J Wolfers 221–67. Washington, DC: Brookings Inst.
    [Google Scholar]
  9. Abowd J, Schmutte I. 2019. An economic analysis of privacy protection and statistical accuracy as social choices. Am. Econ. Rev. 109:1171–202
    [Google Scholar]
  10. Apple Differ. Priv. Team 2017. Learning with privacy at scale. Apple Machine Learning Research Dec. https://machinelearning.apple.com/research/learning-with-privacy-at-scale
    [Google Scholar]
  11. Ashmead R. 2019. Estimating the variance of complex differentially private algorithms Presentation to the 2019 Joint Statistical Meetings Denver, CO: July 27–Aug. 1
  12. Balle B, Barthe G, Gaboardi M, Hsu J, Sato T. 2020. Hypothesis testing interpretations and Rényi differential privacy. PMLR 108:2496–506
    [Google Scholar]
  13. Benedetto G, Stanley JC, Totty E. 2018. The creation and use of the SIPP synthetic beta v7.0 Work. Pap., US Census Bur., US Dep. Commer. Washington, DC:
  14. Bittau A, Erlingsson Ú, Maniatis P, Mironov I, Raghunathan I et al. 2017. Prochlo: Strong privacy for analytics in the crowd. SOSP '17: Proceedings of the 26th Symposium on Operating Systems Principles441–59. New York: ACM
    [Google Scholar]
  15. Brenner H, Nissim K. 2014. Impossibility of differentially private universally optimal mechanisms. SIAM J. Comput. 43:1513–40
    [Google Scholar]
  16. Bun M, Steinke T. 2016. Concentrated differential privacy: Simplifications, extensions, and lower bounds. Theory of Cryptography (TCC 2016)ed. M Hirt, A Smith635–58. New York: Springer
    [Google Scholar]
  17. Canonne C, Kamath G, Steinke T. 2020. The discrete Gaussian for differential privacy. 34th Conference on Neural Information Processing Systems (NeurIPS 2020)ed. H Larochelle,, M Ranzato, R Hadsell, MF Balcan, H Lin15676–88 Red Hook, NY: Curran Assoc.
    [Google Scholar]
  18. Canonne C, Kamath G, Steinke T. 2021. The discrete Gaussian for differential privacy (extended version). arXiv:2004.00010 [cs.DS]
  19. Childs J, Eggleston C, Fobia A. 2020. Measuring privacy and accuracy concerns for 2020 Census data dissemination Presented at BigSurv 20 Conference, online, Nov. 6–Dec. 4
  20. Comm. Natl. Stat 2020. 2020 Census Data Products: Data Needs and Privacy Considerations: Proceedings of a Workshop Washington, DC: Natl. Acad. Press
  21. Congr. Res. Serv 2017. Congressional redistricting law: background and recent court rulings CRS Rep., Congr. Res. Serv. Washington, DC:
  22. Cox L. 1976. Statistical disclosure in publication hierarchies. Rep. 14, Res. Proj. Confid. Surv., Dep. Stat., Univ. Stockholm Swed:.
  23. Cox L. 1995. Network models for complementary cell suppression. J. Am. Stat. Assoc. 90:4321453–62
    [Google Scholar]
  24. Dalenius T. 1977. Towards a methodology for statistical disclosure control. Stat. Tidskrift 15:429–44
    [Google Scholar]
  25. De Montjoye Y, Hidalgo C, Verleysen M, Blondel V. 2013. Unique in the crowd: the privacy bounds of human mobility. Sci. Rep. 3:1376
    [Google Scholar]
  26. De Montjoye Y, Radaelli L, Singh VK, Pentland A. 2015. Unique in the shopping mall: on the reidentifiability of credit card metadata. Science 347:6221536–39
    [Google Scholar]
  27. Devine J, Spence M. 2022. Update on the Demographic Profile and Demographic and Housing Characteristics file (DHC) Presentation to the Census Scientific Advisory Committee, online, March 17
  28. Ding B, Kulkarni J, Yekhanin S. 2017. Collecting telemetry data privately. 31st Conference on Neural Information Processing Systems (NIPS 2017)ed. U von Luxburg, I Guyon, S Bengio, H Wallach, R Ferguspp. 3574–83 Red Hook, NY: Curran Assoc.
    [Google Scholar]
  29. Dinur I, Nissim K. 2003. Revealing information while preserving privacy. PODS '03: Proceedings of the Twenty-Second ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems202–10. New York: ACM
    [Google Scholar]
  30. Dong J, Roth A, Su WJ 2020. Gaussian differential privacy. J. R. Stat. Soc. Ser. B 86:13–37
    [Google Scholar]
  31. DSEP (Data Steward. Exec. Policy Comm.) 2010. January 14, 2010 final DSEP meeting record. Rep., US Census Bur., US Dep. Commer. Washington, DC:
  32. DSEP (Data Steward. Exec. Policy Comm.) 2017. May 10, 2017 final DSEP meeting record. Rep., US Census Bur., US Dep. Commer. Washington, DC:
  33. Duncan G, Elliot M, Salazar-Gonzalez JJ. 2011. Statistical Confidentiality: Principles and Practice New York: Springer
  34. Dwork C. 2006. Differential privacy. International Colloquium on Automata, Languages and Programming (ICALP 2006)ed. M Bugliesi, B Preneel, V Sassone, I Wegener1–12. New York: Springer
    [Google Scholar]
  35. Dwork C, Kenthapadi K, McSherry F, Mironov I, Naor M. 2006b. Our data, ourselves: privacy via distributed noise generation. Annual International Conference on the Theory and Applications of Cryptographic Techniques (EUROCRYPT 2006) S Vaudenay 486–503. New York: Springer
    [Google Scholar]
  36. Dwork C, McSherry F, Nissim K, Smith A 2006a. Calibrating noise to sensitivity in private data analysis. Proceedings of the Theory of Cryptography Conference (TCC 2006) S Halevi, T Rabin 265–84. New York: Springer
    [Google Scholar]
  37. Dwork C, Naor M. 2010. On the difficulties of disclosure prevention in statistical databases or the case for differential privacy. J. Priv. Conf. 2:193–107
    [Google Scholar]
  38. Dwork C, Pottenger R. 2013. Toward practicing privacy. J. Am. Med. Inform. Assoc. 20:1102–8
    [Google Scholar]
  39. Dwork C, Rothblum GN. 2016. Concentrated differential privacy. arXiv:1603.01887 [cs.DS]
  40. Dwork C, Smith A, Steinke T, Ullman J, Vadhan S. 2015. Robust traceability from trace amounts. Presented at the 56th Annual Symposium on Foundations of Computer Science (FOCS 2015), Oct. 18–20 Berkeley, CA:
  41. Dwork C, Yekhanin S 2008. New efficient attacks on statistical disclosure control mechanisms. CRYPTO 2008: Proceedings of the 28th Annual Conference on Cryptology: Advances in Cryptology D Wagner 469–80. New York: Springer
    [Google Scholar]
  42. Elamir E, Skinner C. 2004. Record-level measures of disclosure risk for survey microdata. S3RI Methodol. Work. Pap. M04/02, Southampton Stat. Sci. Res. Inst. Southampton, UK:
  43. Fellegi IP. 1972. On the question of statistical confidentiality. J. Am. Stat. Assoc. 67:7–18
    [Google Scholar]
  44. Garfinkel S 2015. De-identification of personal information. Tech. Rep. NISTIR 8053 Natl. Inst. Standards Technol. Gaithersburg, MD:
  45. Garfinkel S, Abowd J, Martindale C. 2019. Understanding database reconstruction attacks on public data. Commun. ACM 62:346–53
    [Google Scholar]
  46. Hammond EC, Horn D. 1954. The relationship between human smoking habits and death rates: a follow-up study of 187,866 men. JAMA 155:151316–28
    [Google Scholar]
  47. Hawes M. 2020. Implementing differential privacy: seven lessons from the 2020 United States Census. Harv. Data Sci. Rev. 2020:2.2 https://doi.org/10.1162/99608f92.353c6f99
    [Google Scholar]
  48. Hawes M. 2022. Reconstruction and re-identification of the Demographic and Housing Characteristics file (DHC) Presentation to the Census Scientific Advisory Committee, online, March 17
  49. Hawes M, Rodriguez R. 2021. Determining the privacy-loss budget: research into alternatives to differential privacy Presentation to the Census Scientific Advisory Committee, online, May 25
  50. Homer N, Szelinger S, Redman M, Duggan D, Tembe W et al. 2008. Resolving individuals contributing trace amounts of DNA to highly complex mixtures using high-density SNP genotyping microarrays. PLOS Genet. 4:e1000167
    [Google Scholar]
  51. JASON 2020. Formal privacy methods for the 2020 Census. Distrib. Statement A, JSR-19-2F, JASON, Mitre Corp. McLean, VA:
  52. Johnson N, Near JP, Song D 2018. Towards practical differential privacy for SQL queries. PVLDB 11:5526–39
    [Google Scholar]
  53. Kasiviswanathan S, Smith A. 2014. On the ‘semantics’ of differential privacy: a Bayesian formulation. J. Priv. Conf. 6:11–16
    [Google Scholar]
  54. Kim HJ, Drechsler J, Thompson KJ. 2020. Synthetic microdata for establishment surveys under informative sampling. J. R. Stat. Soc. Ser. A 184:1255–81
    [Google Scholar]
  55. Kinney SK, Reiter JP, Reznek AP, Miranda J, Jarmin RS, Abowd JM. 2011. Towards unrestricted public use business microdata: the synthetic longitudinal business database. Work. Pap. CES-11-04 US Census Bur., US Dep. Commer. Washington, DC:
  56. Little RJA 1993. Statistical analysis of masked data. J. Off. Stat. 9:2407–26
    [Google Scholar]
  57. Little RJA, Rubin D. 2002. Statistical Analysis with Missing Data New York: Wiley. , 2nd ed..
  58. Ma CYT, Yau DKY, Yip NK, Rao NSV. 2013. Privacy vulnerability of published anonymous mobility traces. IEEE/ACM Trans. Netw. 21:3720–33
    [Google Scholar]
  59. Machanavajjhala A, Kifer D, Abowd J, Gehrke J, Vilhuber L. 2008. Privacy: theory meets practice on the map. Proceedings of the 2008 IEEE 24th International Conference on Data Engineering277–86. New York: IEEE
    [Google Scholar]
  60. Massell D. 2001. Chapter VIII: the theory and practice of using data to build capacity: state and local strategies and their effects. Teach. Coll. Rec. 103:8148–69
    [Google Scholar]
  61. McKenna L. 2018. Disclosure avoidance techniques used for the 1970 through 2010 Decennial Censuses of Population and Housing. Work. Pap., US Census Bur., US Dep. Commer. Washington, DC:
  62. McKenna L. 2019. Disclosure avoidance techniques used for the 1960 through 2010 Census Work. Pap., US Census Bur., US Dep. Commer. Washington, DC:
  63. Meng X-L. 2020. 2020: a very busy year for data science (and for HDSR). Harv. Data Sci. Rev. 2020:2.1 https://doi.org/10.1162/99608f92.2ce040dc
    [Google Scholar]
  64. Mironov I. 2017. Rényi differential privacy. Proceedings of the 2017 IEEE 30th Computer Security Foundations Symposium (CSF)263–75. New York: IEEE
    [Google Scholar]
  65. Narayanan A, Shmatikov V. 2008. Robust de-anonymization of large sparse datasets. 2008 IEEE Symposium on Security and Privacy (sp 2008)111–25. New York: IEEE
    [Google Scholar]
  66. NASEM (Natl. Acad. Sci. Eng. Med.) 2021. Principles and Practices for a Federal Statistical Agency Washington, DC: Natl. Acad. Press. , 7th ed..
  67. NIH (Natl. Inst. Health) 2008. Modifications to GWAS data access. Press Release Aug. 28
  68. OFSPS (Off. Fed. Stat. Policy Stand.) 1978. Report on statistical disclosure and disclosure avoidance techniques. Stat. Policy Work. Pap. 2, Off. Fed. Stat. Policy Stand., Off. Manag. Budg. Washington, DC:
  69. OMB (Off. Manag. Budg.) 1978. Statistical policy directive no. 15: standards for the classification of federal data on race and ethnicity. Fed. Regist. 43:8719269–70
    [Google Scholar]
  70. OMB (Off. Manag. Budg.) 1997. Statistical policy directive no. 15: revisions to the standards for the classification of federal data on race and ethnicity. Fed. Regist. 62:58782
    [Google Scholar]
  71. OMB (Off. Manag. Budg.) 2014. Statistical policy directive no. 1: fundamental responsibilities of federal statistical agencies and recognized statistical units. Fed. Regist. 79:71609–16
    [Google Scholar]
  72. Reamer A. 2020. Counting for dollars 2020: the role of the decennial Census in the geographic distribution of federal funds. Res. Rep., GW Inst. Public Policy, George Washington Univ. Washington, DC:
  73. Rodriguez R. 2021. Disclosure avoidance and the American Community Survey Presentation to the 2021 ACS Data Users Conference, online, May 20
  74. Rubin DB 1993. Statistical disclosure limitation. J. Off. Stat. 9:2461–68
    [Google Scholar]
  75. Steel P. 2013. The Census Bureau's new cell suppression system. Proceedings of the 2013 Federal Committee on Statistical Methodology (FCSM) Research Conference Washington, DC: OMB
    [Google Scholar]
  76. Sullivan T. 2020. Coming to our census: how social statistics underpin our democracy (and republic). Harv. Data Sci. Rev. 2020:2.1 https://doi.org/10.1162/99608f92.c871f9e0
    [Google Scholar]
  77. Thompson KJ, Kim H, Bassel N, Bembridge K, Coleman C et al. 2020. Final report: Economic Census Synthetic Data Project research team. USCB ADEP Work. Pap. Ser. ADEP-WP-2020-05, US Census Bur., US Dep. Commer. Washington, DC:
  78. Thornburg v. Gingles, 478 U.S. 30 1986.)
  79. US House 1790. Remarks by James Madison on the Bill for the 1790 Census 1st Congr., 2nd Sess., Feb. 2
  80. USCB (US Census Bur.) 1994. Geographic Areas Reference Manual Washington, DC: US Dep. Commer.
  81. USCB (US Census Bur.) 2011. 2010 Census Redistricting Data (Public Law 94-171) Summary File: Technical Documentation Washington, DC: US Dep. Commer.
  82. USCB (US Census Bur.) 2015. United States Public Use Microdata Sample (PUMS): Technical Documentation Washington, DC: US Dep. Commer.
  83. USCB (US Census Bur.) 2019. The 2020 Census and confidentiality Fact Sheet, US Census Bur., US Dep. Commer. Washington, DC: https://www.census.gov/content/dam/Census/library/factsheets/2019/comm/2020-confidentiality-factsheet.pdf
  84. USCB (US Census Bur.) 2021a. 2020 Census state redistricting data (Public Law 94-171) summary file: technical documentation Summary File, US Census Bur., US Dep. Commer. Washington, DC: https://www2.census.gov/programs-surveys/decennial/2020/technical-documentation/complete-tech-docs/summary-file/2020Census_PL94_171Redistricting_StatesTechDoc_English.pdf
  85. USCB (US Census Bur.) 2021b. Developing the DAS: demonstration data and progress metrics: 2020 Census redistricting data (P.L. 94-171), production settings: 2021-06-08, detailed summary metrics. Program Inf., US Census Bur., US Dep. Commer. Washington, DC: https://www.census.gov/programs-surveys/decennial-census/decade/2020/planning-management/process/disclosure-avoidance/2020-das-development.html
  86. USCB (US Census Bur.) 2022b. LEHD origin-destination employment statistics (2002–2019). Data Set, US Census Bur., US Dep. Commer. Washington, DC: https://onthemap.ces.census.gov
  87. USCB (US Census Bur.) 2022a. Understanding disclosure avoidance-related variability in the 2020 Census redistricting data. Fact Sheet, US Census Bur., US Dep. Commer. Washington, DC:
  88. Wasserman L, Zhou S. 2010. A statistical framework for differential privacy. J. Am. Stat. Assoc. 105:375–89
    [Google Scholar]
  89. Willenborg L, de Waal T. 2001. Elements of Statistical Disclosure Control New York: Springer
  90. Wright T, Irimata K. 2021. Empirical study of two aspects of the TopDown Algorithm output for redistricting: reliability & variability (August 5, 2021 Update) Rep., USCB CSRM Studies Ser. Stat. 2021-02 US Census Bur., US Dep. Commer. Washington, DC:
  91. Yu F, Fienberg SE, Slavković A, Uhler C. 2014. Scalable privacy-preserving data sharing methodology for genome-wide association studies. J. Biomed. Inform. 50:133–41
    [Google Scholar]
/content/journals/10.1146/annurev-statistics-010422-034226
Loading
/content/journals/10.1146/annurev-statistics-010422-034226
Loading

Data & Media loading...

  • Article Type: Review Article
This is a required field
Please enter a valid email address
Approval was a Success
Invalid data
An Error Occurred
Approval was partially successful, following selected items could not be processed due to error