1932

Abstract

Federal statistics agencies strive to release data products that are informative for many purposes, yet also protect the privacy and confidentiality of data subjects’ identities and sensitive attributes. This article reviews the role that differential privacy, a disclosure risk criterion developed in the cryptography community, can and does play in federal data releases. The article describes potential benefits and limitations of using differential privacy for federal data, reviews current federal data products that satisfy differential privacy, and outlines research needed for adoption of differential privacy to become widespread among federal agencies.

Loading

Article metrics loading...

/content/journals/10.1146/annurev-statistics-030718-105142
2019-03-07
2024-10-05
Loading full text...

Full text loading...

/deliver/fulltext/statistics/6/1/annurev-statistics-030718-105142.html?itemId=/content/journals/10.1146/annurev-statistics-030718-105142&mimeType=html&fmt=ahah

Literature Cited

  1. Abowd JM. 2017. How will statistical agencies operate when all data are private?. J. Priv. Confid. 7:3 https://doi.org/10.29012/jpc.v7i3.404
    [Crossref] [Google Scholar]
  2. Abowd JM, Alvisi L, Dwork C, Kannan S, Machanavajjhala A, Reiter J. 2017. Privacy-preserving data analysis for the federal statistical agencies Tech. Rep., Comput. Community Consort., Washington, DC
    [Google Scholar]
  3. Abowd JM, Schmutte IM. 2015. Economic analysis and statistical disclosure limitation. Brookings Panel Econ. Act. https://www.brookings.edu/wp-content/uploads/2015/03/2015a_abowd.pdf
    [Google Scholar]
  4. Abowd JM, Schmutte IM. 2017. Revisiting the economics of privacy: population statistics and confidentiality protection as public goods. Work. Pap. 17–37, Cent. Econ. Stud., US Census Bur.
    [Google Scholar]
  5. Abowd JM, Vilhuber L. 2008. How protective are synthetic data?. Privacy in Statistical Databases J Domingo-Ferrer, Y Saygun239–46 New York: Springer-Verlag
    [Google Scholar]
  6. Barrientos AF, Bolton A, Balmat T, Reiter JP, de Figueiredo JM et al. 2018. Providing access to confidential research data through synthesis and verification: an application to data on employees of the U.S. federal government. Ann. Appl. Stat. 12:1124–56
    [Google Scholar]
  7. Bassily R, Smith A, Thakurta A. 2014. Private empirical risk minimization: efficient algorithms and tight error bounds. IEEE 55th Annual Symposium on Foundations of Computer Science464–73 Red Hook, NY: Curran
    [Google Scholar]
  8. Brand R. 2002. Micro-data protection through noise addition. Inference Control in Statistical Databases J Domingo-Ferrer97–116 Berlin: Springer-Verlag
    [Google Scholar]
  9. Chaudhuri K, Monteleoni C. 2009. Privacy-preserving logistic regression. Advances in Neural Information Processing Systems 21 D Koller, D Schuurmans, Y Bengio, L Bottou289–96 Red Hook, NY: Curran
    [Google Scholar]
  10. Chaudhuri K, Monteleoni C, Sarwate AD. 2011. Differentially private empirical risk minimization. J. Mach. Learn. Res. 12:1069–109
    [Google Scholar]
  11. Chen Y, Machanavajjhala A, Reiter JP, Barrientos A. 2016. Differentially private regression diagnostics. IEEE 16th International Conference on Data Mining (ICDM)81–90 Red Hook, NY: Curran
    [Google Scholar]
  12. Commission on Evidence-Based Policymaking. 2017. The promise of evidence-based policymaking Rep., Comm. Evid. Based Policymaking, Washington, DC
    [Google Scholar]
  13. Cox LH. 1980. Suppression methodology and statistical disclosure control. J. Am. Stat. Assoc. 75:377–85
    [Google Scholar]
  14. Cox LH 2010. Vulnerability of complementary cell suppression to intruder attack. J. Priv. Confid. 1:2235–51
    [Google Scholar]
  15. Dalenius T, Reiss SP. 1982. Data-swapping: a technique for disclosure control. J. Stat. Plann. Inference 6:73–85
    [Google Scholar]
  16. Dinur I, Nissim K. 2003. Revealing information while preserving privacy. Proceedings of the Twenty-second ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems202–10 New York: ACM
    [Google Scholar]
  17. Drechsler J, Reiter JP. 2010. Sampling with synthesis: a new approach for releasing public use census microdata. J. Am. Stat. Assoc. 105:1347–57
    [Google Scholar]
  18. Duncan GT, Lambert D. 1986. Disclosure-limited data dissemination. J. Am. Stat. Assoc. 81:10–28
    [Google Scholar]
  19. Duncan GT, Lambert D. 1989. The risk of disclosure for microdata. J. Bus. Econ. Stat. 7:207–17
    [Google Scholar]
  20. Dwork C. 2006. Differential privacy. Automata, Languages and Programming: 33rd International Colloquium, ICALP 2006, Venice, Italy, July 10–14, 2006. Proceedings, Part II M Bugliesi, B Preneel, V Sassone, I Wegener1–12 Berlin: Springer
    [Google Scholar]
  21. Dwork C, Lei J. 2009. Differential privacy and robust statistics. Proceedings of the Forty-first Annual ACM Symposium on Theory of Computing371–80 New York: ACM
    [Google Scholar]
  22. Dwork C, Roth A 2014. The Algorithmic Foundations of Differential Privacy Breda, Neth.: Now Publ.
    [Google Scholar]
  23. Dwork C, Smith A, Steinke T, Ulllman J. 2017. Exposed! A survey of attacks on private data. Annu. Rev. Stat. Appl. 4:61–84
    [Google Scholar]
  24. Elringsson U, Pihur V, Korolova A. 2014. Randomized aggregatable privacy-preserving ordinal response. Proceedings of the 2014 ACM SIGSAC Conference on Computer and Communications Security1054–67 New York: ACM
    [Google Scholar]
  25. Federal Committee on Statistical Methodology. 2005. Report on statistical disclosure limitation methodology (second version) Stat. Policy Work. Pap. 22, Off. Manag. Budg., Washington, DC
    [Google Scholar]
  26. Garfinkel SL. 2018. Modernizing the disclosure avoidance system for the 2020 census Presentation, CS Colloquium, Georgetown Univ., Feb. 16. http://simson.net/ref/2018/2018-02-14%20Garfinkel%20Gerogetown%20Modernizing%20the%20DAS%20for%20the%202020%20Census.pdf
    [Google Scholar]
  27. Haney S, Machanavajjhala A, Abowd JM, Graham M, Kutzbach M, Vilhuber L. 2017. Utility cost of formal privacy for releasing national employer-employee statistics. Proceedings of the 2017 ACM SIGMOD International Conference on Management of Data1339–54 New York: ACM
    [Google Scholar]
  28. Holan SH, Toth D, Ferreira M, Karr AF. 2010. Bayesian multiscale imputation with implications for data confidentiality. J. Am. Stat. Assoc. 105:564–77
    [Google Scholar]
  29. Homer N, Szelinger S, Redman M, Duggan D, Tembe W et al. 2008. Resolving individuals contributing trace amounts of DNA to highly complex mixtures using high-density SNP genotyping microarrays. PLOS Genet. 4:8:1–9
    [Google Scholar]
  30. Honkela A, Das M, Dikmen O, Kaski S. 2016. Efficient differentially private learning improves drug sensitivity prediction. arXiv:1606.02109 [stat.ML]
    [Google Scholar]
  31. Hu J, Reiter JP, Wang Q. 2018. Dirichlet process mixture models for modeling and generating synthetic versions of nested categorical data. Bayesian Anal. 13:183–200
    [Google Scholar]
  32. Hundepool A, Domingo-Ferrer J, Franconi L, Giessing S, Nordholdt ES et al. 2012. Statistical Disclosure Control New York: Wiley
    [Google Scholar]
  33. Karr AF. 2016. Data sharing and data access. Annu. Rev. Stat. Appl. 3:113–32
    [Google Scholar]
  34. Karr AF, Reiter JP. 2014. Using statistics to protect privacy. Privacy, Big Data, and the Public Good: Frameworks for Engagement J Lane, V Stodden, S Bender, H Nissenbaum276–95 Cambridge, UK: Cambridge Univ. Press
    [Google Scholar]
  35. Karwa V, Kifer D, Slavković AB. 2015. Private posterior distributions from variational approximations. arXiv:1511.07896 [stat.ML]
    [Google Scholar]
  36. Keller SA, Shipp S, Schroeder A. 2016. Does big data change the privacy landscape? A review of the issues. Annu. Rev. Stat. Appl. 3:161–80
    [Google Scholar]
  37. Kifer D, Smith A, Thakurta A. 2012. Private convex empirical risk minimization and high-dimensional regression. J. Mach. Learn. Res. 23:25.1–40
    [Google Scholar]
  38. Kinney SK, Reiter JP, Reznek AP, Miranda J, Jarmin RS, Abowd JM. 2011. Towards unrestricted public use business microdata: the synthetic Longitudinal Business Database. Int. Stat. Rev. 79:363–84
    [Google Scholar]
  39. Leclerc P, Clark S, Sexton W. 2017. 2020 decennial census: formal privacy implementation and update Presentation, DIMACS/Northeast Big Data Hub Workshop on Overcoming Barriers to Data Sharing Including Privacy and Fairness, Rutgers Univ., Oct. 24. http://dimacs.rutgers.edu/archive/Workshops/Barriers/Slides/Leclerc_Clark_SextonDIMACS-2018-Approved-4.pdf
    [Google Scholar]
  40. Li C, Miklau G, Hay M, McGregor A, Rastogi V. 2015. The matrix mechanism: optimizing linear counting queries under differential privacy. VLDB J. 24:757–81
    [Google Scholar]
  41. Little RJA. 1993. Statistical analysis of masked data. J. Off. Stat. 9:407–26
    [Google Scholar]
  42. Machanavajjhala A, Kifer D, Abowd J, Gehrke J, Vilhuber L. 2008. Privacy: theory meets practice on the map. IEEE 24th International Conference on Data Engineering277–86 Red Hook, NY: Curran
    [Google Scholar]
  43. Manrique-Vallier D, Reiter JP. 2012. Estimating identification disclosure risk using mixed membership models. J. Am. Stat. Assoc. 107:1385–94
    [Google Scholar]
  44. McClure D, Reiter JP. 2012. Differential privacy and statistical disclosure risk measures: an illustration with binary synthetic data. Trans. Data Privacy 5:535–52
    [Google Scholar]
  45. Qardaji W, Yang W, Li N. 2014. PriView: practical differentially private release of marginal contingency tables. Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data1435–46 New York: ACM
    [Google Scholar]
  46. Raghunathan TE, Reiter JP, Rubin DB. 2003. Multiple imputation for statistical disclosure limitation. J. Off. Stat. 19:1–16
    [Google Scholar]
  47. Reiter JP. 2003. Inference for partially synthetic, public use microdata sets. Surv. Methodol. 29:181–89
    [Google Scholar]
  48. Reiter JP. 2005a. Estimating identification risks in microdata. J. Am. Stat. Assoc. 100:1103–13
    [Google Scholar]
  49. Reiter JP. 2005b. Releasing multiply-imputed, synthetic public use microdata: an illustration and empirical study. J. R. Stat. Soc. A 168:185–205
    [Google Scholar]
  50. Reiter JP. 2012. Statistical approaches to protecting confidentiality for microdata and their effects on the quality of statistical inferences. Public Opin. Q. 76:163–81
    [Google Scholar]
  51. Reiter JP, Oganian A, Karr AF. 2009. Verification servers: enabling analysts to assess the quality of inferences from public use data. Comput. Stat. Data Anal. 53:1475–82
    [Google Scholar]
  52. Reiter JP, Raghunathan TE. 2007. The multiple adaptations of multiple imputation. J. Am. Stat. Assoc. 102:1462–71
    [Google Scholar]
  53. Rubin DB. 1993. Discussion: statistical disclosure limitation. J. Off. Stat. 9:462–68
    [Google Scholar]
  54. Sarlós T. 2006. Improved approximation algorithms for large matrices via random projections. 47th Annual IEEE Symposium on Foundations of Computer Science (FOCS’06)143–52 Red Hook, NY: Curran
    [Google Scholar]
  55. Sheffet O. 2015. Differentially private ordinary least squares: t-values, confidence intervals and rejecting null-hypotheses. arXiv:1507.02482 [cs.DS]
    [Google Scholar]
  56. Shlomo N, Skinner CJ. 2010. Assessing the protection provided by misclassification-based disclosure limitation methods for survey microdata. Ann. Appl. Stat. 4:1291–310
    [Google Scholar]
  57. Skinner CJ, Elliot MJ. 2002. A measure of disclosure risk for microdata. J. R. Stat. Soc. B 64:855–67
    [Google Scholar]
  58. Skinner CJ, Shlomo N. 2008. Assessing identification risk in survey microdata using log-linear models. J. Am. Stat. Assoc. 103:989–1001
    [Google Scholar]
  59. Spruill NL. 1982. Measures of confidentiality. JSM Proceedings, Survey Research Methods Section260–65 Alexandria, VA: Am. Stat. Assoc.
    [Google Scholar]
  60. Sweeney L. 2013. Matching known patients to health records in Washington state data Tech. Rep., Data Priv. Lab, Harvard Univ.
    [Google Scholar]
  61. Tang J, Korolova A, Bai X, Wang X, Wang X. 2017. Privacy loss in Apple's implementation of differential privacy on MacOS 10.12. arXiv:1709.02753 [cs.CR]
    [Google Scholar]
  62. Vilhuber L, Abowd JA, Reiter JP. 2016. Synthetic establishment microdata around the world. Stat. J. IAOS 32:65–68
    [Google Scholar]
  63. Willenborg L, de Waal T 2001. Elements of Statistical Disclosure Control New York: Springer-Verlag
    [Google Scholar]
  64. Winkler WE. 2007. Examples of easy-to-implement, widely used methods of masking for which analytic properties are not justified US Census Bur. Res. Rep. Ser. Stat. 2007–21, US Census Bur., Washington, DC
    [Google Scholar]
  65. Wu X, Fredrikson M, Wu W, Jha S, Naughton JF. 2015. Revisiting differentially private regression: Lessons from learning theory and their consequences. arXiv:1512.06388 [cs.CR]
    [Google Scholar]
  66. Yancey WE, Winkler WE, Creecy RH. 2002. Disclosure risk assessment in perturbative microdata protection. Inference Control in Statistical Databases J Domingo-Ferrer135–52 Berlin: Springer-Verlag
    [Google Scholar]
  67. Zhang J, Zhang Z, Xiao X, Yang Y, Winslett M. 2012. Functional mechanism: regression analysis under differential privacy. Proceedings of the VLDB Endowment 51364–75 San Jose, CA: VLDB Endow.
    [Google Scholar]
/content/journals/10.1146/annurev-statistics-030718-105142
Loading
/content/journals/10.1146/annurev-statistics-030718-105142
Loading

Data & Media loading...

  • Article Type: Review Article
This is a required field
Please enter a valid email address
Approval was a Success
Invalid data
An Error Occurred
Approval was partially successful, following selected items could not be processed due to error