Privacy-preserving statistical data analysis addresses the general question of protecting privacy when publicly releasing information about a sensitive dataset. A privacy attack takes seemingly innocuous released information and uses it to discern the private details of individuals, thus demonstrating that such information compromises privacy. For example, re-identification attacks have shown that it is easy to link supposedly de-identified records to the identity of the individual concerned. This survey focuses on attacking aggregate data, such as statistics about how many individuals have a certain disease, genetic trait, or combination thereof. We consider two types of attacks: reconstruction attacks, which approximately determine a sensitive feature of all the individuals covered by the dataset, and tracing attacks, which determine whether or not a target individual's data are included in the dataset. We also discuss techniques from the differential privacy literature for releasing approximate aggregate statistics while provably thwarting any privacy attack.


Article metrics loading...

Loading full text...

Full text loading...


Literature Cited

  1. Bassily R, Nissim K, Smith A, Stemmer U, Ullman J. 2016. Algorithmic stability for adaptive data analysis. Proc. 48th Annu. ACM SIGACT Symp. Theory Comput.1046–59 New York: ACM [Google Scholar]
  2. Blum A, Ligett K, Roth A. 2008. A learning theory approach to non-interactive database privacy. Proc. 40th Annu. ACM Symp. Theory Comput.609–18 New York: ACM [Google Scholar]
  3. Bun M, Steinke T. 2016. Concentrated differential privacy: Simplifications, extensions, and lower bounds. arXiv1605.02065 [cs.CR]
  4. Calandrino J, Kilzer A, Narayanan A, Felten E, Shmatikov V. 2011. “You might also like:” privacy risks of collaborative filtering. Proc. 2011 IEEE Symp. Secur. Priv.231–46 Washington, DC: IEEE [Google Scholar]
  5. Chor B, Fiat A, Naor M. 1994. Tracing traitors. Proc. 14th Annu. Int. Cryptol. Conf. Adv. Cryptol.257–70 London: Springer-Verlag [Google Scholar]
  6. De A. 2012. Lower bounds in differential privacy. Proc. 9th Intl. Conf. Theory Cryptogr., TCC 2012, ed. R Cramer 321–38 Berlin: Springer-Verlag [Google Scholar]
  7. Dinur I, Nissim K. 2003. Revealing information while preserving privacy. Proc. 22nd ACM SIGMOD-SIGACT-SIGART Symp. Princ. Database Syst.202–10 New York: ACM [Google Scholar]
  8. Dwork C. 2006. Differential privacy. Automata, Languages and Programming, 33rd International Colloquium, ICALP 2006, Venice, Italy, July 10–14, 2006, Proceedings, Part II eds. M Bugliesi, B Preneel, V Sassone, I Wegener 1–13 Berlin: Springer-Verlag [Google Scholar]
  9. Dwork C, Feldman V, Hardt M, Pitassi T, Reingold O, Roth A. 2015a. Generalization in adaptive data analysis and holdout reuse. Advances in Neural Information Processing Systems 28 ed. C Cortes, ND Lawrence, DD Lee, M Sugiyama, R Garnett 2350–58 Red Hook, NY: Curran [Google Scholar]
  10. Dwork C, Feldman V, Hardt M, Pitassi T, Reingold O, Roth A. 2015b. Preserving statistical validity in adaptive data analysis. Proc. 47th Annu. ACM Symp. Theory Comput.117–26 New York: ACM [Google Scholar]
  11. Dwork C, Feldman V, Hardt M, Pitassi T, Reingold O, Roth A. 2015c. The reusable holdout: preserving validity in adaptive data analysis. Science 349:636–38 [Google Scholar]
  12. Dwork C, Kenthapadi K, McSherry F, Mironov I, Naor M. 2006a. Our data, ourselves: privacy via distributed noise generation. Advances in Cryptology - EUROCRYPT 2006: 24th Annual International Conference on the Theory and Applications of Cryptographic Techniques, St. Petersburg, Russia, May 28 - June 1, 2006. Proceedings ed. S Vaudenay 486–503 Berlin: Springer [Google Scholar]
  13. Dwork C, McSherry F, Nissim K, Smith A. 2006b. Calibrating noise to sensitivity in private data analysis. Theory of Cryptography: Third Theory of Cryptography Conference, TCC 2006, New York, NY, USA, March 4–7, 2006. Proceedings ed. S Halevi, T Rabin 265–84 Berlin: Springer [Google Scholar]
  14. Dwork C, McSherry F, Talwar K. 2007. The price of privacy and the limits of LP decoding. Proc. 39th Annu. ACM Symp. Theory Comput.85–94 New York: ACM [Google Scholar]
  15. Dwork C, Naor M. 2008. On the difficulties of disclosure prevention in statistical databases or the case for differential privacy. J. Priv. Confid. 2:18 [Google Scholar]
  16. Dwork C, Naor M, Reingold O, Rothblum GN, Vadhan SP. 2009. On the complexity of differentially private data release: efficient algorithms and hardness results. Proc. 41st Annu. ACM Symp. Theory Comput.381–90 New York: ACM [Google Scholar]
  17. Dwork C, Roth A. 2014. The algorithmic foundations of differential privacy. Found. Trends Theor. Comput. Sci. 9:211–407 [Google Scholar]
  18. Dwork C, Rothblum G. 2016. Concentrated differential privacy. arXiv:1603.01887 [cs.DS]
  19. Dwork C, Smith A, Steinke T, Ullman J, Vadhan S. 2015d. Robust traceability from trace amounts. IEEE 56th Ann. Symp. on Foundations of Computer Science (FOCS), Berkeley, CA, Oct. 18–20 http://ieeexplore.ieee.org/document/7354420/ [Google Scholar]
  20. Dwork C, Yekhanin S. 2008. New efficient attacks on statistical disclosure control mechanisms. Proc. 28th Annu. Conf. Cryptology: Adv. Cryptol.469–480 Berlin: Springer-Verlag [Google Scholar]
  21. Hardt M, Miklau G, Pierce B, Roth A. 2012. Slides and video for tutorial presentations DIMACS Worksh. Recent Work on Differ. Priv. across Comput. Sci., Oct. 24–26 http://dimacs.rutgers.edu/Workshops/DifferentialPrivacy/Slides/slides.html [Google Scholar]
  22. Hardt M, Rothblum G. 2010. A multiplicative weights mechanism for privacy-preserving data analysis. Proc. 2010 IEEE 51st Ann. Symp. Found. Comput. Sci.61–70 Washington, DC: IEEE [Google Scholar]
  23. Hardt M, Ullman J. 2014. Preventing false discovery in interactive data analysis is hard. 2014 IEEE 55th Annu. Symp. Found. Comput. Sci.454–63 Washington, DC: IEEE [Google Scholar]
  24. Homer N, Szelinger S, Redman M, Duggan D, Tembe W. et al. 2008. Resolving individuals contributing trace amounts of dna to highly complex mixtures using high-density snp genotyping microarrays. PLOS Genet. 4:e1000167 [Google Scholar]
  25. Kasiviswanathan SP, Rudelson M, Smith A. 2013. The power of linear reconstruction attacks. Proc. 24th Annu. ACM-SIAM Symp. Discret. Algorithms1415–33 Philadelphia: SIAM [Google Scholar]
  26. Kasiviswanathan SP, Rudelson M, Smith A, Ullman J. 2010. The price of privately releasing contingency tables and the spectra of random matrices with correlated rows. Proc. 42nd ACM Symp. Theory Comput.775–84 New York: ACM [Google Scholar]
  27. Ligett K. 2013. Slides and archived video tutorials. Simons Inst. Worksh. on Big Data and Differ. Priv., Berkeley, CA, Dec. 11–14. https://simons.berkeley.edu/workshops/schedule/78 [Google Scholar]
  28. Muthukrishnan S, Nikolov A. 2012. Optimal private halfspace counting via discrepancy. Proc. 44th Annu. ACM Symp. Theory Comput.1285–92 New York: ACM [Google Scholar]
  29. Narayanan A, Shmatikov V. 2008. Robust de-anonymization of large sparse datasets. Proc. 2008 IEEE Symp. Secur. Priv.111–125 Washington, DC: IEEE [Google Scholar]
  30. Neyman J, Pearson ES. 1933. On the problem of the most efficient tests of statistical hypotheses. Philos. Trans. R. Soc. A 231:289–337 [Google Scholar]
  31. Nikolov A, Talwar K, Zhang L. 2013. The geometry of differential privacy: the sparse and approximate cases. Proc. 45th Annu. ACM Symp. Theory Comput.351–60 New York: ACM [Google Scholar]
  32. Pres. Counc. Advis. Sci. Technol. 2014. Report to the President: Big Data and Privacy: A Technological Perspective. Washington, DC: Executive Off. Pres. [Google Scholar]
  33. Quadrianto N, Smola AJ, Caetano TS, Le QV. 2009. Estimating labels from label proportions. J. Mach. Learn. Res. 10:2349–74 [Google Scholar]
  34. Sankararaman S, Obozinski G, Jordan MI, Halperin E. 2009. Genomic privacy and limits of individual detection in a pool. Nat. Genet. 41:965–67 [Google Scholar]
  35. Steinke T, Ullman J. 2015. Interactive fingerprinting codes and the hardness of preventing false discovery. JMLR Worksh. Conf. Proc. 40:1–41 [Google Scholar]
  36. Sweeney L. 1997. Weaving technology and policy together to maintain confidentiality. J. Law Med. Ethics 25:98–110 [Google Scholar]
  37. Vadhan S. 2016. The complexity of differential privacy Work. Pap., Cent. Res. Comput. Soc., Harvard Univ. http://privacytools.seas.harvard.edu/publications/complexity-differential-privacy [Google Scholar]
  38. Wasserman L, Zhou S. 2010. A statistical framework for differential privacy. J. Am. Stat. Assoc. 105:375–89 [Google Scholar]
  39. Yu F. 2015. Scalable privacy-preserving data sharing methodology for genome-wide association studies. PhD thesis, Carnegie Mellon Univ. [Google Scholar]

Data & Media loading...

  • Article Type: Review Article
This is a required field
Please enter a valid email address
Approval was a Success
Invalid data
An Error Occurred
Approval was partially successful, following selected items could not be processed due to error