Exposed! A Survey of Attacks on Private Data

Cynthia Dwork; Adam Smith; Thomas Steinke; Jonathan Ullman

doi:10.1146/annurev-statistics-060116-054123

Annual Review of Statistics and Its Application

Volume 4, 2017

Review Article

Free

Exposed! A Survey of Attacks on Private Data

Cynthia Dwork¹, Adam Smith², Thomas Steinke³, and Jonathan Ullman⁴
View Affiliations Hide Affiliations

Affiliations: ¹Microsoft Research, Mountain View, California 94043; email: [email protected] ²Department of Computer Science and Engineering, Pennsylvania State University, State College, Pennsylvania 16802; email: [email protected] ³John A. Paulson School of Engineering and Applied Sciences, Harvard University, Cambridge, Massachusetts 02138; email: [email protected] ⁴College of Computer and Information Science, Northeastern University, Boston, Massachusetts 02115; email: [email protected]
Vol. 4:61-84 (Volume publication date March 2017) https://doi.org/10.1146/annurev-statistics-060116-054123
First published as a Review in Advance on December 21, 2016
© Annual Reviews

Abstract

Privacy-preserving statistical data analysis addresses the general question of protecting privacy when publicly releasing information about a sensitive dataset. A privacy attack takes seemingly innocuous released information and uses it to discern the private details of individuals, thus demonstrating that such information compromises privacy. For example, re-identification attacks have shown that it is easy to link supposedly de-identified records to the identity of the individual concerned. This survey focuses on attacking aggregate data, such as statistics about how many individuals have a certain disease, genetic trait, or combination thereof. We consider two types of attacks: reconstruction attacks, which approximately determine a sensitive feature of all the individuals covered by the dataset, and tracing attacks, which determine whether or not a target individual's data are included in the dataset. We also discuss techniques from the differential privacy literature for releasing approximate aggregate statistics while provably thwarting any privacy attack.

Keyword(s): differential privacy, privacy, privacy attacks, re-identification, reconstruction attacks, tracing attacks

Article metrics loading...

/content/journals/10.1146/annurev-statistics-060116-054123

2017-03-07

2024-06-18

Full text loading...

/deliver/fulltext/statistics/4/1/annurev-statistics-060116-054123.html?itemId=/content/journals/10.1146/annurev-statistics-060116-054123&mimeType=html&fmt=ahah

Literature Cited

Bassily R, Nissim K, Smith A, Stemmer U, Ullman J. 2016. Algorithmic stability for adaptive data analysis. Proc. 48th Annu. ACM SIGACT Symp. Theory Comput.1046–59 New York: ACM [Google Scholar]
Blum A, Ligett K, Roth A. 2008. A learning theory approach to non-interactive database privacy. Proc. 40th Annu. ACM Symp. Theory Comput.609–18 New York: ACM [Google Scholar]
Bun M, Steinke T. 2016. Concentrated differential privacy: Simplifications, extensions, and lower bounds. arXiv1605.02065 [cs.CR]
Calandrino J, Kilzer A, Narayanan A, Felten E, Shmatikov V. 2011. “You might also like:” privacy risks of collaborative filtering. Proc. 2011 IEEE Symp. Secur. Priv.231–46 Washington, DC: IEEE [Google Scholar]
Chor B, Fiat A, Naor M. 1994. Tracing traitors. Proc. 14th Annu. Int. Cryptol. Conf. Adv. Cryptol.257–70 London: Springer-Verlag [Google Scholar]
De A. 2012. Lower bounds in differential privacy. Proc. 9th Intl. Conf. Theory Cryptogr., TCC 2012, ed. R Cramer 321–38 Berlin: Springer-Verlag [Google Scholar]
Dinur I, Nissim K. 2003. Revealing information while preserving privacy. Proc. 22nd ACM SIGMOD-SIGACT-SIGART Symp. Princ. Database Syst.202–10 New York: ACM [Google Scholar]
Dwork C. 2006. Differential privacy. Automata, Languages and Programming, 33rd International Colloquium, ICALP 2006, Venice, Italy, July 10–14, 2006, Proceedings, Part II eds. M Bugliesi, B Preneel, V Sassone, I Wegener 1–13 Berlin: Springer-Verlag [Google Scholar]
Dwork C, Feldman V, Hardt M, Pitassi T, Reingold O, Roth A. 2015a. Generalization in adaptive data analysis and holdout reuse. Advances in Neural Information Processing Systems 28 ed. C Cortes, ND Lawrence, DD Lee, M Sugiyama, R Garnett 2350–58 Red Hook, NY: Curran [Google Scholar]
Dwork C, Feldman V, Hardt M, Pitassi T, Reingold O, Roth A. 2015b. Preserving statistical validity in adaptive data analysis. Proc. 47th Annu. ACM Symp. Theory Comput.117–26 New York: ACM [Google Scholar]
Dwork C, Feldman V, Hardt M, Pitassi T, Reingold O, Roth A. 2015c. The reusable holdout: preserving validity in adaptive data analysis. Science 349:636–38 [Google Scholar]
Dwork C, Kenthapadi K, McSherry F, Mironov I, Naor M. 2006a. Our data, ourselves: privacy via distributed noise generation. Advances in Cryptology - EUROCRYPT 2006: 24th Annual International Conference on the Theory and Applications of Cryptographic Techniques, St. Petersburg, Russia, May 28 - June 1, 2006. Proceedings ed. S Vaudenay 486–503 Berlin: Springer [Google Scholar]
Dwork C, McSherry F, Nissim K, Smith A. 2006b. Calibrating noise to sensitivity in private data analysis. Theory of Cryptography: Third Theory of Cryptography Conference, TCC 2006, New York, NY, USA, March 4–7, 2006. Proceedings ed. S Halevi, T Rabin 265–84 Berlin: Springer [Google Scholar]
Dwork C, McSherry F, Talwar K. 2007. The price of privacy and the limits of LP decoding. Proc. 39th Annu. ACM Symp. Theory Comput.85–94 New York: ACM [Google Scholar]
Dwork C, Naor M. 2008. On the difficulties of disclosure prevention in statistical databases or the case for differential privacy. J. Priv. Confid. 2:18 [Google Scholar]
Dwork C, Naor M, Reingold O, Rothblum GN, Vadhan SP. 2009. On the complexity of differentially private data release: efficient algorithms and hardness results. Proc. 41st Annu. ACM Symp. Theory Comput.381–90 New York: ACM [Google Scholar]
Dwork C, Roth A. 2014. The algorithmic foundations of differential privacy. Found. Trends Theor. Comput. Sci. 9:211–407 [Google Scholar]
Dwork C, Rothblum G. 2016. Concentrated differential privacy. arXiv:1603.01887 [cs.DS]
Dwork C, Smith A, Steinke T, Ullman J, Vadhan S. 2015d. Robust traceability from trace amounts. IEEE 56th Ann. Symp. on Foundations of Computer Science (FOCS), Berkeley, CA, Oct. 18–20 http://ieeexplore.ieee.org/document/7354420/ [Google Scholar]
Dwork C, Yekhanin S. 2008. New efficient attacks on statistical disclosure control mechanisms. Proc. 28th Annu. Conf. Cryptology: Adv. Cryptol.469–480 Berlin: Springer-Verlag [Google Scholar]
Hardt M, Miklau G, Pierce B, Roth A. 2012. Slides and video for tutorial presentations DIMACS Worksh. Recent Work on Differ. Priv. across Comput. Sci., Oct. 24–26 http://dimacs.rutgers.edu/Workshops/DifferentialPrivacy/Slides/slides.html [Google Scholar]
Hardt M, Rothblum G. 2010. A multiplicative weights mechanism for privacy-preserving data analysis. Proc. 2010 IEEE 51st Ann. Symp. Found. Comput. Sci.61–70 Washington, DC: IEEE [Google Scholar]
Hardt M, Ullman J. 2014. Preventing false discovery in interactive data analysis is hard. 2014 IEEE 55th Annu. Symp. Found. Comput. Sci.454–63 Washington, DC: IEEE [Google Scholar]
Homer N, Szelinger S, Redman M, Duggan D, Tembe W. et al. 2008. Resolving individuals contributing trace amounts of dna to highly complex mixtures using high-density snp genotyping microarrays. PLOS Genet. 4:e1000167 [Google Scholar]
Kasiviswanathan SP, Rudelson M, Smith A. 2013. The power of linear reconstruction attacks. Proc. 24th Annu. ACM-SIAM Symp. Discret. Algorithms1415–33 Philadelphia: SIAM [Google Scholar]
Kasiviswanathan SP, Rudelson M, Smith A, Ullman J. 2010. The price of privately releasing contingency tables and the spectra of random matrices with correlated rows. Proc. 42nd ACM Symp. Theory Comput.775–84 New York: ACM [Google Scholar]
Ligett K. 2013. Slides and archived video tutorials. Simons Inst. Worksh. on Big Data and Differ. Priv., Berkeley, CA, Dec. 11–14. https://simons.berkeley.edu/workshops/schedule/78 [Google Scholar]
Muthukrishnan S, Nikolov A. 2012. Optimal private halfspace counting via discrepancy. Proc. 44th Annu. ACM Symp. Theory Comput.1285–92 New York: ACM [Google Scholar]
Narayanan A, Shmatikov V. 2008. Robust de-anonymization of large sparse datasets. Proc. 2008 IEEE Symp. Secur. Priv.111–125 Washington, DC: IEEE [Google Scholar]
Neyman J, Pearson ES. 1933. On the problem of the most efficient tests of statistical hypotheses. Philos. Trans. R. Soc. A 231:289–337 [Google Scholar]
Nikolov A, Talwar K, Zhang L. 2013. The geometry of differential privacy: the sparse and approximate cases. Proc. 45th Annu. ACM Symp. Theory Comput.351–60 New York: ACM [Google Scholar]
Pres. Counc. Advis. Sci. Technol. 2014. Report to the President: Big Data and Privacy: A Technological Perspective. Washington, DC: Executive Off. Pres. [Google Scholar]
Quadrianto N, Smola AJ, Caetano TS, Le QV. 2009. Estimating labels from label proportions. J. Mach. Learn. Res. 10:2349–74 [Google Scholar]
Sankararaman S, Obozinski G, Jordan MI, Halperin E. 2009. Genomic privacy and limits of individual detection in a pool. Nat. Genet. 41:965–67 [Google Scholar]
Steinke T, Ullman J. 2015. Interactive fingerprinting codes and the hardness of preventing false discovery. JMLR Worksh. Conf. Proc. 40:1–41 [Google Scholar]
Sweeney L. 1997. Weaving technology and policy together to maintain confidentiality. J. Law Med. Ethics 25:98–110 [Google Scholar]
Vadhan S. 2016. The complexity of differential privacy Work. Pap., Cent. Res. Comput. Soc., Harvard Univ. http://privacytools.seas.harvard.edu/publications/complexity-differential-privacy [Google Scholar]
Wasserman L, Zhou S. 2010. A statistical framework for differential privacy. J. Am. Stat. Assoc. 105:375–89 [Google Scholar]
Yu F. 2015. Scalable privacy-preserving data sharing methodology for genome-wide association studies. PhD thesis, Carnegie Mellon Univ. [Google Scholar]

/content/journals/10.1146/annurev-statistics-060116-054123

Exposed! A Survey of Attacks on Private Data

Annual Review of Statistics and Its Application 4, 61 (2017); https://doi.org/10.1146/annurev-statistics-060116-054123

/content/journals/10.1146/annurev-statistics-060116-054123

Data & Media loading...

Article Type: Review Article

Most Cited Most Cited RSS feed

- Functional Data Analysis
  
  Jane-Ling Wang, Jeng-Min Chiou, and Hans-Georg Müller
  
  Vol. 3 (2016), pp. 257–295
- Probabilistic Forecasting
  
  Tilmann Gneiting, and Matthias Katzfuss
  
  Vol. 1 (2014), pp. 125–151
- Bayesian Computing with INLA: A Review
  
  Håvard Rue, Andrea Riebler, Sigrunn H. Sørbye, Janine B. Illian, Daniel P. Simpson, and Finn K. Lindgren
  
  Vol. 4 (2017), pp. 395–421
- Functional Regression
  
  Jeffrey S. Morris
  
  Vol. 2 (2015), pp. 321–359
- Topological Data Analysis
  
  Larry Wasserman
  
  Vol. 5 (2018), pp. 501–532
- Algorithmic Fairness: Choices, Assumptions, and Definitions
  
  Shira Mitchell, Eric Potash, Solon Barocas, Alexander D'Amour, and Kristian Lum
  
  Vol. 8 (2021), pp. 141–163
- Microbiome, Metagenomics, and High-Dimensional Compositional Data Analysis
  
  Hongzhe Li
  
  Vol. 2 (2015), pp. 73–94
- Learning Deep Generative Models
  
  Ruslan Salakhutdinov
  
  Vol. 2 (2015), pp. 361–385
- On p-Values and Bayes Factors
  
  Leonhard Held, and Manuela Ott
  
  Vol. 5 (2018), pp. 393–419
- Curriculum Guidelines for Undergraduate Programs in Data Science
  
  Richard D. De Veaux, Mahesh Agarwal, Maia Averett, Benjamin S. Baumer, Andrew Bray, Thomas C. Bressoud, Lance Bryant, Lei Z. Cheng, Amanda Francis, Robert Gould, Albert Y. Kim, Matt Kretchmar, Qin Lu, Ann Moskol, Deborah Nolan, Roberto Pelayo, Sean Raleigh, Ricky J. Sethi, Mutiara Sondjaja, Neelesh Tiruviluamala, Paul X. Uhlig, Talitha M. Washington, Curtis L. Wesley, David White, and Ping Ye
  
  Vol. 4 (2017), pp. 15–30
More Less

Annual Review of Statistics and Its Application

Volume 4, 2017

Review Article

Free

Exposed! A Survey of Attacks on Private Data

Abstract

Most Read This Month

Most Cited Most Cited RSS feed