Differential Privacy and Federal Data Releases

Jerome P. Reiter

doi:10.1146/annurev-statistics-030718-105142

Annual Review of Statistics and Its Application

Volume 6, 2019

Review Article

Free

Differential Privacy and Federal Data Releases

Jerome P. Reiter¹
View Affiliations Hide Affiliations

Affiliations: Department of Statistical Science, Duke University, Durham, North Carolina 27705, USA; email: [email protected]
Vol. 6:85-101 (Volume publication date March 2019) https://doi.org/10.1146/annurev-statistics-030718-105142
First published as a Review in Advance on October 24, 2018
Copyright © 2019 by Annual Reviews. All rights reserved

Abstract

Federal statistics agencies strive to release data products that are informative for many purposes, yet also protect the privacy and confidentiality of data subjects’ identities and sensitive attributes. This article reviews the role that differential privacy, a disclosure risk criterion developed in the cryptography community, can and does play in federal data releases. The article describes potential benefits and limitations of using differential privacy for federal data, reviews current federal data products that satisfy differential privacy, and outlines research needed for adoption of differential privacy to become widespread among federal agencies.

Keyword(s): confidentiality, disclosure, government

Article metrics loading...

/content/journals/10.1146/annurev-statistics-030718-105142

2019-03-07

2024-05-06

Full text loading...

/deliver/fulltext/statistics/6/1/annurev-statistics-030718-105142.html?itemId=/content/journals/10.1146/annurev-statistics-030718-105142&mimeType=html&fmt=ahah

Literature Cited

Abowd JM. 2017. How will statistical agencies operate when all data are private?. J. Priv. Confid. 7:3 https://doi.org/10.29012/jpc.v7i3.404
[Crossref] [Google Scholar]
Abowd JM, Alvisi L, Dwork C, Kannan S, Machanavajjhala A, Reiter J. 2017. Privacy-preserving data analysis for the federal statistical agencies Tech. Rep., Comput. Community Consort., Washington, DC
Abowd JM, Schmutte IM. 2015. Economic analysis and statistical disclosure limitation. Brookings Panel Econ. Act. https://www.brookings.edu/wp-content/uploads/2015/03/2015a_abowd.pdf
[Google Scholar]
Abowd JM, Schmutte IM. 2017. Revisiting the economics of privacy: population statistics and confidentiality protection as public goods. Work. Pap. 17–37, Cent. Econ. Stud., US Census Bur.
[Google Scholar]
Abowd JM, Vilhuber L. 2008. How protective are synthetic data?. Privacy in Statistical Databases J Domingo-Ferrer, Y Saygun239–46 New York: Springer-Verlag
[Google Scholar]
Barrientos AF, Bolton A, Balmat T, Reiter JP, de Figueiredo JM et al. 2018. Providing access to confidential research data through synthesis and verification: an application to data on employees of the U.S. federal government. Ann. Appl. Stat. 12:1124–56
[Google Scholar]
Bassily R, Smith A, Thakurta A. 2014. Private empirical risk minimization: efficient algorithms and tight error bounds. IEEE 55th Annual Symposium on Foundations of Computer Science464–73 Red Hook, NY: Curran
[Google Scholar]
Brand R. 2002. Micro-data protection through noise addition. Inference Control in Statistical Databases J Domingo-Ferrer97–116 Berlin: Springer-Verlag
[Google Scholar]
Chaudhuri K, Monteleoni C. 2009. Privacy-preserving logistic regression. Advances in Neural Information Processing Systems 21 D Koller, D Schuurmans, Y Bengio, L Bottou289–96 Red Hook, NY: Curran
[Google Scholar]
Chaudhuri K, Monteleoni C, Sarwate AD. 2011. Differentially private empirical risk minimization. J. Mach. Learn. Res. 12:1069–109
[Google Scholar]
Chen Y, Machanavajjhala A, Reiter JP, Barrientos A. 2016. Differentially private regression diagnostics. IEEE 16th International Conference on Data Mining (ICDM)81–90 Red Hook, NY: Curran
[Google Scholar]
Commission on Evidence-Based Policymaking. 2017. The promise of evidence-based policymaking Rep., Comm. Evid. Based Policymaking, Washington, DC
Cox LH. 1980. Suppression methodology and statistical disclosure control. J. Am. Stat. Assoc. 75:377–85
[Google Scholar]
Cox LH 2010. Vulnerability of complementary cell suppression to intruder attack. J. Priv. Confid. 1:2235–51
[Google Scholar]
Dalenius T, Reiss SP. 1982. Data-swapping: a technique for disclosure control. J. Stat. Plann. Inference 6:73–85
[Google Scholar]
Dinur I, Nissim K. 2003. Revealing information while preserving privacy. Proceedings of the Twenty-second ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems202–10 New York: ACM
[Google Scholar]
Drechsler J, Reiter JP. 2010. Sampling with synthesis: a new approach for releasing public use census microdata. J. Am. Stat. Assoc. 105:1347–57
[Google Scholar]
Duncan GT, Lambert D. 1986. Disclosure-limited data dissemination. J. Am. Stat. Assoc. 81:10–28
[Google Scholar]
Duncan GT, Lambert D. 1989. The risk of disclosure for microdata. J. Bus. Econ. Stat. 7:207–17
[Google Scholar]
Dwork C. 2006. Differential privacy. Automata, Languages and Programming: 33rd International Colloquium, ICALP 2006, Venice, Italy, July 10–14, 2006. Proceedings, Part II M Bugliesi, B Preneel, V Sassone, I Wegener1–12 Berlin: Springer
[Google Scholar]
Dwork C, Lei J. 2009. Differential privacy and robust statistics. Proceedings of the Forty-first Annual ACM Symposium on Theory of Computing371–80 New York: ACM
[Google Scholar]
Dwork C, Roth A 2014. The Algorithmic Foundations of Differential Privacy Breda, Neth.: Now Publ.
Dwork C, Smith A, Steinke T, Ulllman J. 2017. Exposed! A survey of attacks on private data. Annu. Rev. Stat. Appl. 4:61–84
[Google Scholar]
Elringsson U, Pihur V, Korolova A. 2014. Randomized aggregatable privacy-preserving ordinal response. Proceedings of the 2014 ACM SIGSAC Conference on Computer and Communications Security1054–67 New York: ACM
[Google Scholar]
Federal Committee on Statistical Methodology. 2005. Report on statistical disclosure limitation methodology (second version) Stat. Policy Work. Pap. 22, Off. Manag. Budg., Washington, DC
Garfinkel SL. 2018. Modernizing the disclosure avoidance system for the 2020 census Presentation, CS Colloquium, Georgetown Univ., Feb. 16. http://simson.net/ref/2018/2018-02-14%20Garfinkel%20Gerogetown%20Modernizing%20the%20DAS%20for%20the%202020%20Census.pdf
Haney S, Machanavajjhala A, Abowd JM, Graham M, Kutzbach M, Vilhuber L. 2017. Utility cost of formal privacy for releasing national employer-employee statistics. Proceedings of the 2017 ACM SIGMOD International Conference on Management of Data1339–54 New York: ACM
[Google Scholar]
Holan SH, Toth D, Ferreira M, Karr AF. 2010. Bayesian multiscale imputation with implications for data confidentiality. J. Am. Stat. Assoc. 105:564–77
[Google Scholar]
Homer N, Szelinger S, Redman M, Duggan D, Tembe W et al. 2008. Resolving individuals contributing trace amounts of DNA to highly complex mixtures using high-density SNP genotyping microarrays. PLOS Genet. 4:8:1–9
[Google Scholar]
Honkela A, Das M, Dikmen O, Kaski S. 2016. Efficient differentially private learning improves drug sensitivity prediction. arXiv:1606.02109 [stat.ML]
[Google Scholar]
Hu J, Reiter JP, Wang Q. 2018. Dirichlet process mixture models for modeling and generating synthetic versions of nested categorical data. Bayesian Anal. 13:183–200
[Google Scholar]
Hundepool A, Domingo-Ferrer J, Franconi L, Giessing S, Nordholdt ES et al. 2012. Statistical Disclosure Control New York: Wiley
Karr AF. 2016. Data sharing and data access. Annu. Rev. Stat. Appl. 3:113–32
[Google Scholar]
Karr AF, Reiter JP. 2014. Using statistics to protect privacy. Privacy, Big Data, and the Public Good: Frameworks for Engagement J Lane, V Stodden, S Bender, H Nissenbaum276–95 Cambridge, UK: Cambridge Univ. Press
[Google Scholar]
Karwa V, Kifer D, Slavković AB. 2015. Private posterior distributions from variational approximations. arXiv:1511.07896 [stat.ML]
[Google Scholar]
Keller SA, Shipp S, Schroeder A. 2016. Does big data change the privacy landscape? A review of the issues. Annu. Rev. Stat. Appl. 3:161–80
[Google Scholar]
Kifer D, Smith A, Thakurta A. 2012. Private convex empirical risk minimization and high-dimensional regression. J. Mach. Learn. Res. 23:25.1–40
[Google Scholar]
Kinney SK, Reiter JP, Reznek AP, Miranda J, Jarmin RS, Abowd JM. 2011. Towards unrestricted public use business microdata: the synthetic Longitudinal Business Database. Int. Stat. Rev. 79:363–84
[Google Scholar]
Leclerc P, Clark S, Sexton W. 2017. 2020 decennial census: formal privacy implementation and update Presentation, DIMACS/Northeast Big Data Hub Workshop on Overcoming Barriers to Data Sharing Including Privacy and Fairness, Rutgers Univ., Oct. 24. http://dimacs.rutgers.edu/archive/Workshops/Barriers/Slides/Leclerc_Clark_SextonDIMACS-2018-Approved-4.pdf
Li C, Miklau G, Hay M, McGregor A, Rastogi V. 2015. The matrix mechanism: optimizing linear counting queries under differential privacy. VLDB J. 24:757–81
[Google Scholar]
Little RJA. 1993. Statistical analysis of masked data. J. Off. Stat. 9:407–26
[Google Scholar]
Machanavajjhala A, Kifer D, Abowd J, Gehrke J, Vilhuber L. 2008. Privacy: theory meets practice on the map. IEEE 24th International Conference on Data Engineering277–86 Red Hook, NY: Curran
[Google Scholar]
Manrique-Vallier D, Reiter JP. 2012. Estimating identification disclosure risk using mixed membership models. J. Am. Stat. Assoc. 107:1385–94
[Google Scholar]
McClure D, Reiter JP. 2012. Differential privacy and statistical disclosure risk measures: an illustration with binary synthetic data. Trans. Data Privacy 5:535–52
[Google Scholar]
Qardaji W, Yang W, Li N. 2014. PriView: practical differentially private release of marginal contingency tables. Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data1435–46 New York: ACM
[Google Scholar]
Raghunathan TE, Reiter JP, Rubin DB. 2003. Multiple imputation for statistical disclosure limitation. J. Off. Stat. 19:1–16
[Google Scholar]
Reiter JP. 2003. Inference for partially synthetic, public use microdata sets. Surv. Methodol. 29:181–89
[Google Scholar]
Reiter JP. 2005a. Estimating identification risks in microdata. J. Am. Stat. Assoc. 100:1103–13
[Google Scholar]
Reiter JP. 2005b. Releasing multiply-imputed, synthetic public use microdata: an illustration and empirical study. J. R. Stat. Soc. A 168:185–205
[Google Scholar]
Reiter JP. 2012. Statistical approaches to protecting confidentiality for microdata and their effects on the quality of statistical inferences. Public Opin. Q. 76:163–81
[Google Scholar]
Reiter JP, Oganian A, Karr AF. 2009. Verification servers: enabling analysts to assess the quality of inferences from public use data. Comput. Stat. Data Anal. 53:1475–82
[Google Scholar]
Reiter JP, Raghunathan TE. 2007. The multiple adaptations of multiple imputation. J. Am. Stat. Assoc. 102:1462–71
[Google Scholar]
Rubin DB. 1993. Discussion: statistical disclosure limitation. J. Off. Stat. 9:462–68
[Google Scholar]
Sarlós T. 2006. Improved approximation algorithms for large matrices via random projections. 47th Annual IEEE Symposium on Foundations of Computer Science (FOCS’06)143–52 Red Hook, NY: Curran
[Google Scholar]
Sheffet O. 2015. Differentially private ordinary least squares: t-values, confidence intervals and rejecting null-hypotheses. arXiv:1507.02482 [cs.DS]
[Google Scholar]
Shlomo N, Skinner CJ. 2010. Assessing the protection provided by misclassification-based disclosure limitation methods for survey microdata. Ann. Appl. Stat. 4:1291–310
[Google Scholar]
Skinner CJ, Elliot MJ. 2002. A measure of disclosure risk for microdata. J. R. Stat. Soc. B 64:855–67
[Google Scholar]
Skinner CJ, Shlomo N. 2008. Assessing identification risk in survey microdata using log-linear models. J. Am. Stat. Assoc. 103:989–1001
[Google Scholar]
Spruill NL. 1982. Measures of confidentiality. JSM Proceedings, Survey Research Methods Section260–65 Alexandria, VA: Am. Stat. Assoc.
[Google Scholar]
Sweeney L. 2013. Matching known patients to health records in Washington state data Tech. Rep., Data Priv. Lab, Harvard Univ.
Tang J, Korolova A, Bai X, Wang X, Wang X. 2017. Privacy loss in Apple's implementation of differential privacy on MacOS 10.12. arXiv:1709.02753 [cs.CR]
[Google Scholar]
Vilhuber L, Abowd JA, Reiter JP. 2016. Synthetic establishment microdata around the world. Stat. J. IAOS 32:65–68
[Google Scholar]
Willenborg L, de Waal T 2001. Elements of Statistical Disclosure Control New York: Springer-Verlag
Winkler WE. 2007. Examples of easy-to-implement, widely used methods of masking for which analytic properties are not justified US Census Bur. Res. Rep. Ser. Stat. 2007–21, US Census Bur., Washington, DC
Wu X, Fredrikson M, Wu W, Jha S, Naughton JF. 2015. Revisiting differentially private regression: Lessons from learning theory and their consequences. arXiv:1512.06388 [cs.CR]
[Google Scholar]
Yancey WE, Winkler WE, Creecy RH. 2002. Disclosure risk assessment in perturbative microdata protection. Inference Control in Statistical Databases J Domingo-Ferrer135–52 Berlin: Springer-Verlag
[Google Scholar]
Zhang J, Zhang Z, Xiao X, Yang Y, Winslett M. 2012. Functional mechanism: regression analysis under differential privacy. Proceedings of the VLDB Endowment 51364–75 San Jose, CA: VLDB Endow.
[Google Scholar]

/content/journals/10.1146/annurev-statistics-030718-105142

Differential Privacy and Federal Data Releases

Annual Review of Statistics and Its Application 6, 85 (2019); https://doi.org/10.1146/annurev-statistics-030718-105142

/content/journals/10.1146/annurev-statistics-030718-105142

Data & Media loading...

Article Type: Review Article

Most Cited Most Cited RSS feed

- Functional Data Analysis
  
  Jane-Ling Wang, Jeng-Min Chiou, and Hans-Georg Müller
  
  Vol. 3 (2016), pp. 257–295
- Probabilistic Forecasting
  
  Tilmann Gneiting, and Matthias Katzfuss
  
  Vol. 1 (2014), pp. 125–151
- Bayesian Computing with INLA: A Review
  
  Håvard Rue, Andrea Riebler, Sigrunn H. Sørbye, Janine B. Illian, Daniel P. Simpson, and Finn K. Lindgren
  
  Vol. 4 (2017), pp. 395–421
- Functional Regression
  
  Jeffrey S. Morris
  
  Vol. 2 (2015), pp. 321–359
- Topological Data Analysis
  
  Larry Wasserman
  
  Vol. 5 (2018), pp. 501–532
- Algorithmic Fairness: Choices, Assumptions, and Definitions
  
  Shira Mitchell, Eric Potash, Solon Barocas, Alexander D'Amour, and Kristian Lum
  
  Vol. 8 (2021), pp. 141–163
- Microbiome, Metagenomics, and High-Dimensional Compositional Data Analysis
  
  Hongzhe Li
  
  Vol. 2 (2015), pp. 73–94
- Learning Deep Generative Models
  
  Ruslan Salakhutdinov
  
  Vol. 2 (2015), pp. 361–385
- On p-Values and Bayes Factors
  
  Leonhard Held, and Manuela Ott
  
  Vol. 5 (2018), pp. 393–419
- High-Dimensional Statistics with a View Toward Applications in Biology
  
  Peter Bühlmann, Markus Kalisch, and Lukas Meier
  
  Vol. 1 (2014), pp. 255–278
More Less

Annual Review of Statistics and Its Application

Volume 6, 2019

Review Article

Free

Differential Privacy and Federal Data Releases

Abstract

Most Read This Month

Most Cited Most Cited RSS feed