1932

Abstract

Recent technological advances allow for the collection of massive data in the study of complex phenomena over time and/or space in various fields. Many of these data involve sequences of high-dimensional or non-Euclidean measurements, where change-point analysis is a crucial early step in understanding the data. Segmentation, or offline change-point analysis, divides data into homogeneous temporal or spatial segments, making subsequent analysis easier; its online counterpart detects changes in sequentially observed data, allowing for real-time anomaly detection. This article reviews a nonparametric change-point analysis framework that utilizes graphs representing the similarity between observations. This framework can be applied to data as long as a reasonable dissimilarity distance among the observations can be defined. Thus, this framework can be applied to a wide range of applications, from high-dimensional data to non-Euclidean data, such as imaging data or network data. In addition, analytic formulas can be derived to control the false discoveries, making them easy off-the-shelf data analysis tools.

Loading

Article metrics loading...

/content/journals/10.1146/annurev-statistics-122121-033817
2023-03-09
2024-04-29
Loading full text...

Full text loading...

/deliver/fulltext/statistics/10/1/annurev-statistics-122121-033817.html?itemId=/content/journals/10.1146/annurev-statistics-122121-033817&mimeType=html&fmt=ahah

Literature Cited

  1. Arlot S, Celisse A, Harchaoui Z 2019. A kernel multiple change-point algorithm via model selection. J. Mach. Learn. Res. 20:162)
    [Google Scholar]
  2. Barabâsi AL, Jeong H, Néda Z, Ravasz E, Schubert A, Vicsek T. 2002. Evolution of the social network of scientific collaborations. Phys. A Stat. Mech. Appl. 311:3590–614
    [Google Scholar]
  3. Basseville M, Nikiforov IV. 1993. Detection of Abrupt Changes: Theory and Application Englewood Cliffs, NJ: Prentice Hall
  4. Bolton RJ, Hand DJ 2001. Unsupervised profiling methods for fraud detection. Proceedings of Credit Scoring and Credit Control VII235–55 Edinburgh: Univ. Edinburgh Credit Res. Cent.
    [Google Scholar]
  5. Brodsky E, Darkhovsky BS. 1993. Nonparametric Methods in Change Point Problems New York: Springer
  6. Cabeza R, Nyberg L. 2000. Imaging cognition II: An empirical review of 275 PET and fMRI studies. J. Cogn. Neurosci. 12:11–47
    [Google Scholar]
  7. Carlstein EG, Müller HG, Siegmund D. 1994. Change-point problems: Papers from the AMS-IMS-SIAM Summer Research Conference held at Mt. Holyoke College N.p.: Inst. Math. Stat.
  8. Celisse A, Marot G, Pierre-Jean M, Rigaill G 2018. New efficient algorithms for multiple change-point detection with reproducing kernels. Comput. Stat. Data Anal. 128:200–20
    [Google Scholar]
  9. Chan HP, Walther G 2015. Optimal detection of multi-sample aligned sparse signals. Ann. Stat. 43:51865–95
    [Google Scholar]
  10. Chandola V, Banerjee A, Kumar V. 2009. Anomaly detection: a survey. ACM Comput. Surv. 41:315
    [Google Scholar]
  11. Chang WC, Li CL, Yang Y, Póczos B 2019. Kernel change-point detection with auxiliary deep generative models. arXiv:1901.06077 [stat.ML]
  12. Chen H. 2019a. Change-point detection for multivariate and non-Euclidean data with local dependency. arXiv:1903.01598 [stat.ME]
  13. Chen H. 2019b. Sequential change-point detection based on nearest neighbors. Ann. Stat. 47:31381–407
    [Google Scholar]
  14. Chen H, Chen X, Su Y 2018. A weighted edge-count two-sample test for multivariate and object data. J. Am. Stat. Assoc. 113:1146–55
    [Google Scholar]
  15. Chen H, Friedman JH. 2017. A new graph-based two-sample test for multivariate and object data. J. Am. Stat. Assoc. 112:517397–409
    [Google Scholar]
  16. Chen H, Ren H, Yao F, Zou C. 2021. Data-driven selection of the number of change-points via error rate control. J. Am. Stat. Assoc. https://doi.org/10.1080/01621459.2021.1999820
    [Crossref] [Google Scholar]
  17. Chen H, Zhang N. 2015. Graph-based change-point detection. Ann. Stat. 43:1139–76
    [Google Scholar]
  18. Chen H, Zhang NR. 2013. Graph-based tests for two-sample comparisons of categorical data. Stat. Sin. 23:1479–503
    [Google Scholar]
  19. Chen J, Gupta AK. 2000. Parametric Statistical Change Point Analysis: With Applications to Genetics, Medicine, and Finance New York: Springer
  20. Chen LHY, Shao QM 2005. Stein's method for normal approximation. An Introduction to Stein's Method AD Barbour, LHY Chen 1–59 Singapore: World Sci.
    [Google Scholar]
  21. Chu L, Chen H. 2019. Asymptotic distribution-free change-point detection for multivariate and non-Euclidean data. Ann. Stat. 47:1382–414
    [Google Scholar]
  22. Chu L, Chen H. 2022. Sequential change-point detection for high-dimensional and non-Euclidean data. arXiv:1810.05973 [stat.ME]
  23. Collins RT, Lipton A, Kanade T, Fujiyoshi H, Duggins D et al. 2000. A system for video surveillance and monitoring Work. Pap. CMU-RI-TR-00-12 Robotics Inst., Carnegie Mellon Univ. Pittsburgh, PA:
  24. Cox D, Spjøtvoll E. 1982. On partitioning means into groups. Scand. J. Stat. 9:3147–52
    [Google Scholar]
  25. Csörgö M, Horváth L. 1997. Limit Theorems in Change-Point Analysis New York: Wiley
  26. Dai M, Zhang Z, Srivastava A. 2016. Testing stationarity of brain functional connectivity using change-point detection in fMRI data. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops19–27 New York: IEEE
    [Google Scholar]
  27. Dehning J, Zierenberg J, Spitzner FP, Wibral M, Neto JP et al. 2020. Inferring change points in the spread of COVID-19 reveals the effectiveness of interventions. Science 369:6500
    [Google Scholar]
  28. Desobry F, Davy M, Doncarli C. 2005. An online kernel change detection algorithm. IEEE Trans. Sign. Proc. 53:82961–74
    [Google Scholar]
  29. Dong F, He Y, Wang T, Han D, Lu H, Zhao H. 2020. Predicting viral exposure response from modeling the changes of co-expression networks using time series gene expression data. BMC Bioinformatics 21:370
    [Google Scholar]
  30. Eagle N, Pentland A, Lazer D. 2009. Inferring friendship network structure by using mobile phone data. PNAS 106:3615274–78
    [Google Scholar]
  31. Frick K, Munk A, Sieling H. 2014. Multiscale change point inference. J. R. Stat. Soc. Ser. B 76:3495–580
    [Google Scholar]
  32. Friedman J, Rafsky L. 1979. Multivariate generalizations of the Wald-Wolfowitz and Smirnov two-sample tests. Ann. Stat. 7:4697–717
    [Google Scholar]
  33. Fryzlewicz P. 2014. Wild binary segmentation for multiple change-point detection. Ann. Stat. 42:62243–81
    [Google Scholar]
  34. Fryzlewicz P. 2020. Narrowest significance pursuit: inference for multiple change-points in linear models. arXiv:2009.05431 [stat.ME]
  35. Garreau D, Arlot S. 2018. Consistent change-point detection with kernels. Electron. J. Stat. 12:24440–86
    [Google Scholar]
  36. Guia J, Wittlin C. 1999. Nine Problem Areas Concerning Tirant lo Blanc London: Tamesis
  37. Guimarães SJF, Couprie M, de Albuquerque Araújo A, Leite NJ 2003. Video segmentation based on 2D image analysis. Pattern Recognit. Lett. 24:7947–57
    [Google Scholar]
  38. Harchaoui Z, Cappé O. 2007. Retrospective multiple change-point estimation with kernels. 2007 IEEE/SP 14th Workshop on Statistical Signal Processing768–72 New York: IEEE
    [Google Scholar]
  39. Harchaoui Z, Moulines E, Bach FR 2009. Kernel change-point analysis. Advances in Neural Information Processing Systems 21 (NIPS 2008) D Koller, D Schuurmans, Y Bengio, L Bottou 609–16 Red Hook, NY: Curran
    [Google Scholar]
  40. Henze N. 1988. A multivariate two-sample test based on the number of nearest neighbor type coincidences. Ann. Stat. 16:2772–83
    [Google Scholar]
  41. Hu X, Wang Y, Wu Q. 2014. Multiple authors detection: a quantitative analysis of Dream of the Red Chamber. Adv. Adapt. Data Anal. 6:041450012
    [Google Scholar]
  42. Huang S, Ernberg I, Kauffman S. 2009. Cancer attractors: a systems view of tumors from a gene network dynamics and developmental perspective. Semin. Cell Dev. Biol. 20:869–76
    [Google Scholar]
  43. James B, James K, Siegmund D 1992. Asymptotic approximations for likelihood ratio tests and confidence regions for a change-point in the mean of a multivariate Gaussian process. Stat. Sin. 2:169–90
    [Google Scholar]
  44. Jun JJ, Steinmetz NA, Siegle JH, Denman DJ, Bauza M et al. 2017. Fully integrated silicon probes for high-density recording of neural activity. Nature 551:7679232–36
    [Google Scholar]
  45. Kappenman J. 2012. A perfect storm of planetary proportions. IEEE Spectrum 49:226–31
    [Google Scholar]
  46. Killick R, Fearnhead P, Eckley IA. 2012. Optimal detection of changepoints with a linear computational cost. J. Am. Stat. Assoc. 107:5001590–98
    [Google Scholar]
  47. Koprinska I, Carrato S. 2001. Temporal video segmentation: a survey. Signal Proc. Image Commun. 16:5477–500
    [Google Scholar]
  48. Kossinets G, Watts D. 2006. Empirical analysis of an evolving social network. Science 311:575788–90
    [Google Scholar]
  49. Kovács S, Li H, Bühlmann P, Munk A. 2020. Seeded binary segmentation: a general methodology for fast and optimal change point detection. arXiv:2002.06633 [stat.ME]
  50. Li ZN, Zhong X, Drew MS. 2002. Spatial–temporal joint probability images for video segmentation. Pattern Recognit. 35:91847–67
    [Google Scholar]
  51. Liu YW, Chen H. 2022. A fast and efficient change-point detection framework based on approximate k-nearest neighbor graphs. IEEE Trans. Signal Proc. 70:1976–86
    [Google Scholar]
  52. Londschien M, Bühlmann P, Kovács S. 2022. Random forests for change point detection. arXiv:2205.04997 [stat.ME]
  53. Long DG, Drinkwater MR, Holt B, Saatchi S, Bertoia C. 2001. Global ice and land climate studies using scatterometer image data. Eos 82:43503
    [Google Scholar]
  54. Lung-Yut-Fong A, Lévy-Leduc C, Cappé O. 2011. Homogeneity and change-point detection tests for multivariate data using rank statistics. arXiv:1107.1971 [math.ST]
  55. Malladi R, Kalamangalam GP, Aazhang B. 2013. Online Bayesian change point detection algorithms for segmentation of epileptic activity. 2013 Asilomar Conference on Signals, Systems and Computers1833–37 New York: IEEE
    [Google Scholar]
  56. Matteson DS, James NA. 2014. A nonparametric approach for multiple change point analysis of multivariate data. J. Am. Stat. Assoc. 109:505334–45
    [Google Scholar]
  57. Nie L, Nicolae DL. 2021. Weighted-graph-based change point detection. arXiv:2103.02680 [stat.ME]
  58. Niu YS, Zhang H. 2012. The screening and ranking algorithm to detect DNA copy number variations. Ann. Appl. Stat. 6:31306
    [Google Scholar]
  59. Olshen AB, Venkatraman E, Lucito R, Wigler M. 2004. Circular binary segmentation for the analysis of array-based DNA copy number data. Biostatistics 5:4557–72
    [Google Scholar]
  60. Pallotta G, Konjevod G, Cadena J, Nguyen P. 2017. Context-aided analysis of community evolution in networks. Stat. Anal. Data Min. 10:5290–311
    [Google Scholar]
  61. Pastor-Satorras R, Smith E, Solé RV 2003. Evolving protein interaction networks through gene duplication. J. Theor. Biol. 222:2199–210
    [Google Scholar]
  62. Pervaiz F, Pervaiz M, Rehman NA, Saif U. 2012. FluBreaks: early epidemic detection from Google flu trends. J. Med. Internet Res. 14:5e125
    [Google Scholar]
  63. Qu M, Shih FY, Jing J, Wang H. 2005. Automatic solar filament detection using image processing techniques. Solar Phys. 228:1–2119–35
    [Google Scholar]
  64. Radke RJ, Andra S, Al-Kofahi O, Roysam B 2005. Image change detection algorithms: a systematic survey. IEEE Trans. Image Proc. 14:3294–307
    [Google Scholar]
  65. Riba A, Ginebra J. 2006. Diversity of vocabulary and homogeneity of literary style. J. Appl. Stat. 33:7729–41
    [Google Scholar]
  66. Rosenbaum PR. 2005. An exact distribution-free test comparing two multivariate distributions based on adjacency. J. R. Stat. Soc. Ser. B 67:4515–30
    [Google Scholar]
  67. Shi X, Wu Y, Rao CR. 2018. Consistent and powerful non-Euclidean graph-based change-point test with applications to segmenting random interfered video data. PNAS 115:235914–19
    [Google Scholar]
  68. Siegmund D. 1985. Sequential Analysis: Tests and Confidence Intervals New York: Springer
  69. Siegmund D, Yakir B. 2007. The Statistics of Gene Mapping, Vol. 1 New York: Springer
  70. Siegmund D, Yakir B, Zhang N. 2011. Detecting simultaneous variant intervals in aligned sequences. Ann. Appl. Stat. 5:2A645–68
    [Google Scholar]
  71. Song H, Chen H. 2022. Asymptotic distribution-free change-point detection for data with repeated observations. Biometrika 109:783–98
    [Google Scholar]
  72. Srivastava M, Worsley K. 1986. Likelihood ratio tests for a change in the multivariate normal mean. J. Am. Stat. Assoc. 81:199–204
    [Google Scholar]
  73. Stringer C, Pachitariu M, Steinmetz N, Reddy CB, Carandini M, Harris KD. 2019. Spontaneous behaviors drive multidimensional, brainwide activity. Science 364:6437255
    [Google Scholar]
  74. Tartakovsky AG, Rozovskii BL, Blazek RB, Kim H. 2006. A novel approach to detection of intrusions in computer networks via adaptive sequential and batch-sequential change-point detection methods. IEEE Trans. Signal Proc. 54:93372–82
    [Google Scholar]
  75. Tian YL, Lu M, Hampapur A. 2005. Robust and efficient foreground analysis for real-time video surveillance. Computer Vision and Pattern Recognition, 2005, Vol. 11182–87 New York: IEEE
    [Google Scholar]
  76. Tsirigos A, Rigoutsos I. 2005. A new computational method for the detection of horizontal gene transfer events. Nucleic Acids Res. 33:3922–33
    [Google Scholar]
  77. Vostrikova LY. 1981. Detecting “disorder” in multidimensional random processes. Doklady Akad. Nauk. 259:270–74
    [Google Scholar]
  78. Wagner A. 2001. The yeast protein interaction network evolves rapidly and contains few redundant duplicate genes. Mol. Biol. Evol. 18:71283–92
    [Google Scholar]
  79. Wang H, Tang M, Park Y, Priebe CE. 2014. Locality statistics for anomaly detection in time series of graphs. IEEE Trans. Signal Proc. 62:3703–17
    [Google Scholar]
  80. Wang T, Samworth RJ. 2018. High dimensional change point estimation via sparse projection. J. R. Stat. Soc. Ser. B 80:157–83
    [Google Scholar]
  81. Wong WK, Moore AW, Cooper GF, Wagner MM 2003. Bayesian network anomaly pattern detection for disease outbreaks. Proceedings of the 20th International Conference on Machine Learning (ICML-03) T Fawcett, N Mishra 808–15 Menlo Park, CA: AAAI
    [Google Scholar]
  82. Xie M, Han S, Tian B, Parvin S. 2011. Anomaly detection in wireless sensor networks: a survey. J. Netw. Comput. Appl. 34:41302–25
    [Google Scholar]
  83. Xie Y, Siegmund D. 2013. Sequential multi-sensor change-point detection. Information Theory and Applications Workshop (ITA), 2013448–67 New York: IEEE
    [Google Scholar]
  84. Zhang J, Chen H. 2022. Graph-based two-sample tests for data with repeated observations. Stat. Sin. 32:391–415
    [Google Scholar]
  85. Zhang M, Raghunathan A, Jha NK. 2013. MedMon: securing medical devices through wireless monitoring and anomaly detection. IEEE Trans. Biomed. Circ. Syst. 7:6871–81
    [Google Scholar]
  86. Zhang N, Siegmund D, Ji H, Li J 2010. Detecting simultaneous changepoints in multiple sequences. Biometrika 97:3631–45
    [Google Scholar]
  87. Zhang Y, Chen H. 2021. Graph-based multiple change-point detection. arXiv:2110.01170 [stat.ME]
  88. Zhou D, Chen H. 2022. RING-CPD: asymptotic distribution-free change-point detection for multivariate and non-Euclidean data. arXiv:2206.03038 [stat.ME]
  89. Zou C, Wang G, Li R. 2020. Consistent selection of the number of change-points via sample-splitting. Ann. Stat. 48:1)413–39
    [Google Scholar]
/content/journals/10.1146/annurev-statistics-122121-033817
Loading
/content/journals/10.1146/annurev-statistics-122121-033817
Loading

Data & Media loading...

  • Article Type: Review Article
This is a required field
Please enter a valid email address
Approval was a Success
Invalid data
An Error Occurred
Approval was partially successful, following selected items could not be processed due to error