1932

Abstract

Differential privacy is widely considered the formal privacy for privacy-preserving data analysis due to its robust and rigorous guarantees, with increasingly broad adoption in public services, academia, and industry. Although differential privacy originated in the cryptographic context, in this review we argue that, fundamentally, it can be considered a pure statistical concept. We leverage Blackwell's informativeness theorem and focus on demonstrating based on prior work that all definitions of differential privacy can be formally motivated from a hypothesis testing perspective, thereby showing that hypothesis testing is not merely convenient but also the right language for reasoning about differential privacy. This insight leads to the definition of -differential privacy, which extends other differential privacy definitions through a representation theorem. We review techniques that render -differential privacy a unified framework for analyzing privacy bounds in data analysis and machine learning. Applications of this differential privacy definition to private deep learning, private convex optimization, shuffled mechanisms, and US Census data are discussed to highlight the benefits of analyzing privacy bounds under this framework compared with existing alternatives.

Loading

Article metrics loading...

/content/journals/10.1146/annurev-statistics-112723-034158
2025-03-07
2025-04-19
Loading full text...

Full text loading...

/deliver/fulltext/statistics/12/1/annurev-statistics-112723-034158.html?itemId=/content/journals/10.1146/annurev-statistics-112723-034158&mimeType=html&fmt=ahah

Literature Cited

  1. Abadi M, Chu A, Goodfellow I, McMahan HB, Mironov I, et al. 2016.. Deep learning with differential privacy. . In CCS '16: Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security, pp. 30818. New York:: ACM
    [Google Scholar]
  2. Abowd JM. 2018.. The US Census Bureau adopts differential privacy. . In KDD '18: Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, p. 2867. New York:: ACM
    [Google Scholar]
  3. Abowd JM, Ashmead R, Cumings-Menon R, Garfinkel S, Heineck M, et al. 2022.. The 2020 Census disclosure avoidance system TopDown algorithm. . Harv. Data Sci. Rev. 2:. https://doi.org/10.1162/99608f92.529e3cb9
    [Google Scholar]
  4. Aktay A, Bavadekar S, Cossoul G, Davis J, Desfontaines D, et al. 2020.. Google COVID-19 community mobility reports: anonymization process description (version 1.1). . arXiv:2004.04145 [cs.CR]
  5. Altschuler J, Talwar K. 2022.. Privacy of noisy stochastic gradient descent: more iterations without more privacy loss. . In NIPS'22: Proceedings of the 36th International Conference on Neural Information Processing Systems, ed. S Koyejo, S Mohamed, A Agarwal, D Belgrave, K Cho, A Oh , pp. 3788800. Red Hook, NY:: Curran
    [Google Scholar]
  6. Apple Inc. 2017.. Learning with privacy at scale. Tech. Rep. , Dep. Mach. Learn., Apple Inc., Cupertino, CA:
    [Google Scholar]
  7. Awan J, Dong J. 2022.. Log-concave and multivariate canonical noise distributions for differential privacy. . In NIPS'22: Proceedings of the 36th International Conference on Neural Information Processing Systems, ed. S Koyejo, S Mohamed, A Agarwal, D Belgrave, K Cho, A Oh , pp. 3422940. Red Hook, NY:: Curran
    [Google Scholar]
  8. Awan J, Ramasethu A. 2023.. Optimizing noise for f-differential privacy via anti-concentration and stochastic dominance. . arXiv:2308.08343 [cs.CR]
  9. Awan J, Vadhan S. 2023.. Canonical noise distributions and private hypothesis tests. . Ann. Stat. 51:(2):54772
    [Crossref] [Google Scholar]
  10. Balle B, Barthe G, Gaboardi M. 2018.. Privacy amplification by subsampling: tight analyses via couplings and divergences. . In NIPS'18: Proceedings of the 32nd International Conference on Neural Information Processing Systems, ed. S Bengio, HM Wallach, H Larochelle, K Grauman, N Cesa-Bianchi , pp. 628090. Red Hook, NY:: Curran
    [Google Scholar]
  11. Barber RF, Duchi J. 2014.. Privacy: A few definitional aspects and consequences for minimax mean-squared error. . In 53rd IEEE Conference on Decision and Control, pp. 136569. Piscataway, NJ:: IEEE
    [Google Scholar]
  12. Bassily R, Smith A, Thakurta A. 2014.. Private empirical risk minimization: efficient algorithms and tight error bounds. . In 2014 IEEE 55th Annual Symposium on Foundations of Computer Science, pp. 46473. Piscataway, NJ:: IEEE
    [Google Scholar]
  13. Blackwell D. 1950.. Comparison of experiments. Tech. Rep. , Howard Univ., Washington, DC:
    [Google Scholar]
  14. Blackwell D, Girshick MA. 1979.. Theory of Games and Statistical Decisions. Mineola, NY:: Dover
    [Google Scholar]
  15. Bok J, Su W, Altschuler JM. 2024.. Shifted interpolation for differential privacy. . Proc. Mach. Learn. Res. 235::423066
    [Google Scholar]
  16. Bu Z, Dong J, Long Q, Su WJ. 2020.. Deep learning with Gaussian differential privacy. . Harv. Data Sci. Rev. 2:(3). https://doi.org/10.1162/99608f92.cfc5dd25
    [Google Scholar]
  17. Bun M, Dwork C, Rothblum GN, Steinke T. 2018.. Composable and versatile privacy via truncated CDP. . In STOC 2018: Proceedings of the 50th Annual ACM SIGACT Symposium on Theory of Computing, pp. 7486. New York:: ACM
    [Google Scholar]
  18. Bun M, Steinke T. 2016.. Concentrated differential privacy: simplifications, extensions, and lower bounds. . In Theory of Cryptography: 14th International Conference, TCC 2016-B, Beijing, China, October 31-November 3, 2016, Proceedings, ed. M Hirt, A Smith , pp. 63558. New York:: Springer
    [Google Scholar]
  19. Canonne CL, Kamath G, Steinke T. 2020.. The discrete Gaussian for differential privacy. . In NIPS'20: Proceedings of the 34th International Conference on Neural Information Processing Systems, ed. H Larochelle, M Ranzato, R Hadsell, MF Balcan, H Lin , pp. 1567688. Red Hook, NY:: Curran
    [Google Scholar]
  20. Chaudhuri K, Monteleoni C, Sarwate AD. 2011.. Differentially private empirical risk minimization. . J. Mach. Learn. Res. 12::1069109
    [Google Scholar]
  21. Cummings R, Desfontaines D, Evans D, Geambasu R, Huang Y, et al. 2024.. Advancing differential privacy: where we are now and future directions for real-world deployment. . Harv. Data Sci. Rev. 6:(1). https://doi.org/10.1162/99608f92.d3197524
    [Google Scholar]
  22. Desfontaines D, Pejó B. 2019.. SoK: differential privacies. . arXiv:1906.01337 [cs.CR]
  23. Ding B, Kulkarni J, Yekhanin S. 2017.. Collecting telemetry data privately. . In NIPS'17: Proceedings of the 31st International Conference on Neural Information Processing Systems, ed. U von Luxburg, I Guyon, S Bengio, H Wallach, R Fergus , pp. 357483. Red Hook, NY:: Curran
    [Google Scholar]
  24. Dinur I, Nissim K. 2003.. Revealing information while preserving privacy. . In PODS '03: Proceedings of the Twenty-Second ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, pp. 20210. New York:: ACM
    [Google Scholar]
  25. Dong J, Roth A, Su WJ. 2022.. Gaussian differential privacy. . J. R. Stat. Soc. Ser. B 84:(1):337
    [Crossref] [Google Scholar]
  26. Dong J, Su W, Zhang L. 2021.. A central limit theorem for differentially private query answering. . In NIPS'21: Proceedings of the 35th International Conference on Neural Information Processing Systems, ed. M Ranzato, A Beygelzimer, Y Dauphin, PS Liang, J Wortman Vaughan , pp. 1475970. Red Hook, NY:: Curran
    [Google Scholar]
  27. Dwork C. 2006.. Differential privacy. . In International Colloquium on Automata, Languages, and Programming: 33rd International Colloquium, ICALP 2006, Venice, Italy, July 10–14, ed. M Bugliesi, B Preneel, V Sassone, I Wegener , pp. 112. New York:: Springer
    [Google Scholar]
  28. Dwork C, Feldman V, Hardt M, Pitassi T, Reingold O, Roth A. 2017a.. Guilt-free data reuse. . Commun. ACM 60:(4):8693
    [Crossref] [Google Scholar]
  29. Dwork C, Kenthapadi K, McSherry F, Mironov I, Naor M. 2006a.. Our data, ourselves: privacy via distributed noise generation. . In Advances in Cryptology—EUROCRYPT 2006, ed. S Vaudenay , pp. 486503. New York:: Springer
    [Google Scholar]
  30. Dwork C, McSherry F, Nissim K, Smith A. 2006b.. Calibrating noise to sensitivity in private data analysis. . In Theory of Cryptography: Third Theory of Cryptography Conference, TCC 2006, New York, NY, USA, March 4–7, ed. S Halevi, T Rabin , pp. 26584. New York:: Springer
    [Google Scholar]
  31. Dwork C, Roth A. 2014.. The algorithmic foundations of differential privacy. Found. Trends Theor. . Comput. Sci. 9:(3–4):211407
    [Google Scholar]
  32. Dwork C, Rothblum GN. 2016.. Concentrated differential privacy. . arXiv:1603.01887 [cs.DS]
  33. Dwork C, Smith A, Steinke T, Ullman J. 2017b.. Exposed! A survey of attacks on private data. . Annu. Rev. Stat. Appl. 4::6184
    [Crossref] [Google Scholar]
  34. Erlingsson Ú, Feldman V, Mironov I, Raghunathan A, Talwar K, Thakurta A. 2019.. Amplification by shuffling: from local to central differential privacy via anonymity. . In SODA '19: Proceedings of the Thirtieth Annual ACM-SIAM Symposium on Discrete Algorithms, pp. 246879. Philadelphia, PA:: SIAM
    [Google Scholar]
  35. Erlingsson Ú, Pihur V, Korolova A. 2014.. RAPPOR: randomized aggregatable privacy-preserving ordinal response. . In CCS '14: Proceedings of the 2014 ACM SIGSAC Conference on Computer and Communications Security, pp. 105467. New York:: ACM
    [Google Scholar]
  36. Feldman V, McMillan A, Talwar K. 2022.. Hiding among the clones: a simple and nearly optimal analysis of privacy amplification by shuffling. . In 2021 IEEE 62nd Annual Symposium on Foundations of Computer Science (FOCS), pp. 95464. Piscataway, NJ:: IEEE
    [Google Scholar]
  37. Feldman V, McMillan A, Talwar K. 2023.. Stronger privacy amplification by shuffling for Rényi and approximate differential privacy. . In Proceedings of the 2023 Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), pp. 496681. Philadelphia, PA:: SIAM
    [Google Scholar]
  38. Feldman V, Mironov I, Talwar K, Thakurta A. 2018.. Privacy amplification by iteration. . In 2018 IEEE 59th Annual Symposium on Foundations of Computer Science (FOCS), pp. 52132. Piscataway, NJ:: IEEE
    [Google Scholar]
  39. Gopi S, Lee YT, Wutschitz L. 2021.. Numerical composition of differential privacy. . In NIPS'21: Proceedings of the 35th International Conference on Neural Information Processing Systems, ed. M Ranzato, A Beygelzimer, Y Dauphin, PS Liang, J Wortman Vaughan , pp. 1163142. Red Hook, NY:: Curran
    [Google Scholar]
  40. Homer N, Szelinger S, Redman M, Duggan D, Tembe W, et al. 2008.. Resolving individuals contributing trace amounts of DNA to highly complex mixtures using high-density SNP genotyping microarrays. . PLOS Genet. 4:(8):e1000167
    [Crossref] [Google Scholar]
  41. Jiang Y, Chang X, Liu Y, Ding L, Kong L, Jiang B. 2023.. Gaussian differential privacy on Riemannian manifolds. . In NIPS '23: Proceedings of the 37th International Conference on Neural Information Processing Systems, ed. A Oh, T Naumann, A Globerson, K Saenko, M Hardt, S Levine , pp. 1466584. Red Hook, NY:: Curran
    [Google Scholar]
  42. Kairouz P, Oh S, Viswanath P. 2017.. The composition theorem for differential privacy. . IEEE Trans. Inform. Theory 63:(6):403749
    [Crossref] [Google Scholar]
  43. Kingma DP, Ba J. 2014.. Adam: A method for stochastic optimization. . arXiv:1412.6980 [cs.LG]
  44. Koskela A, Jälkö J, Honkela A. 2020.. Computing tight differential privacy guarantees using FFT. . Proc. Mach. Learn. Res. 108::256069
    [Google Scholar]
  45. LeCun Y, Cortes C, Burges CJC. 1998.. The MNIST database of handwritten digits. Test and Training Datasets, Courant Inst., New York Univ., New York, NY:. http://yann.lecun.com/exdb/mnist/
    [Google Scholar]
  46. Lin Y, Ma Y, Wang YX, Redberg R, Bu Z. 2024.. Tractable MCMC for private learning with pure and Gaussian differential privacy. . arXiv:2310.14661 [cs.LG]
  47. Liu Y, Sun K, Jiang B, Kong L. 2022.. Identification, amplification and measurement: a bridge to Gaussian differential privacy. . In NIPS'22: Proceedings of the 36th International Conference on Neural Information Processing Systems, ed. S Koyejo, S Mohamed, A Agarwal, D Belgrave, K Cho, A Oh , pp. 1141022. Red Hook, NY:: Curran
    [Google Scholar]
  48. McSherry F, Talwar K. 2007.. Mechanism design via differential privacy. . In 48th Annual IEEE Symposium on Foundations of Computer Science (FOCS'07), pp. 94103. Piscataway, NJ:: IEEE
    [Google Scholar]
  49. Mironov I. 2017.. Rényi differential privacy. . In 2017 IEEE 30th Computer Security Foundations Symposium (CSF), pp. 26375. Piscataway, NJ:: IEEE
    [Google Scholar]
  50. Nasr M, Hayes J, Steinke T, Balle B, Tramèr F, et al. 2023.. Tight auditing of differentially private machine learning. . In Proceedings of the 32nd USENIX Security Symposium, pp. 163148. Berkeley, CA:: USENIX
    [Google Scholar]
  51. Near J, Darais D. 2022.. Differential privacy: future work & open challenges. . Cybersecurity Insights: a NIST Blog, Jan. 24. https://www.nist.gov/blogs/cybersecurity-insights/differential-privacy-future-work-open-challenges
    [Google Scholar]
  52. Rogers R, Subramaniam S, Peng S, Durfee D, Lee S, et al. 2020.. LinkedIn's audience engagements API: a privacy preserving data analytics system at scale. . arXiv:2002.05839 [cs.CR]
  53. Shokri R, Stronati M, Song C, Shmatikov V. 2017.. Membership inference attacks against machine learning models. . In 2017 IEEE Symposium on Security and Privacy (SP), pp. 318. Piscataway, NJ:: IEEE
    [Google Scholar]
  54. Song S, Chaudhuri K, Sarwate AD. 2013.. Stochastic gradient descent with differentially private updates. . In 2013 IEEE Global Conference on Signal and Information Processing, pp. 24548. Piscataway, NJ:: IEEE
    [Google Scholar]
  55. Su B, Su WJ, Wang C. 2024.. The 2020 United States decennial census is more private than you (might) think. . arXiv:2410.09296 [cs.CR]
    [Google Scholar]
  56. Ullman J. 2021.. Statistical inference is not a privacy violation. . DifferentialPrivacy.org Blog, June 3. https://differentialprivacy.org/inference-is-not-a-privacy-violation/
    [Google Scholar]
  57. [Google Scholar]
  58. Wang C, Su B, Ye J, Shokri R, Su W. 2023.. Unified enhancement of privacy bounds for mixture mechanisms via f-differential privacy. . In NIPS '23: Proceedings of the 37th International Conference on Neural Information Processing Systems, ed. A Oh, T Naumann, A Globerson, K Saenko, M Hardt, S Levine , pp. 5505163. Red Hook, NY:: Curran
    [Google Scholar]
  59. Wang H, Gao S, Zhang H, Shen M, Su WJ. 2022.. Analytical composition of differential privacy via the Edgeworth accountant. . arXiv:2206.04236 [cs.CR]
  60. Wang YX, Balle B, Kasiviswanathan SP. 2019.. Subsampled Renyi differential privacy and analytical moments accountant. . Proc. Mach. Learn. Res. 89::122635
    [Google Scholar]
  61. Wasserman L, Zhou S. 2010.. A statistical framework for differential privacy. . J. Am. Stat. Assoc. 105:(489):37589
    [Crossref] [Google Scholar]
  62. Xu Z, Zhang Y, Andrew G, Choquette-Choo CA, Kairouz P, et al. 2023.. Federated learning of Gboard language models with differential privacy. . arXiv:2305.18465 [cs.LG]
  63. Zheng Q, Chen S, Long Q, Su W. 2021.. Federated f-differential privacy. . Proc. Mach. Learn. Res. 130::225159
    [Google Scholar]
  64. Zhu Y, Dong J, Wang YX. 2022.. Optimal accounting of differential privacy via characteristic function. . Proc. Mach. Learn. Res. 151::4782817
    [Google Scholar]
/content/journals/10.1146/annurev-statistics-112723-034158
Loading
/content/journals/10.1146/annurev-statistics-112723-034158
Loading

Data & Media loading...

  • Article Type: Review Article
This is a required field
Please enter a valid email address
Approval was a Success
Invalid data
An Error Occurred
Approval was partially successful, following selected items could not be processed due to error