1932

Abstract

Manifold learning (ML), also known as nonlinear dimension reduction, is a set of methods to find the low-dimensional structure of data. Dimension reduction for large, high-dimensional data is not merely a way to reduce the data; the new representations and descriptors obtained by ML reveal the geometric shape of high-dimensional point clouds and allow one to visualize, denoise, and interpret them. This review presents the underlying principles of ML, its representative methods, and their statistical foundations, all from a practicing statistician's perspective. It describes the trade-offs and what theory tells us about the parameter and algorithmic choices we make in order to obtain reliable conclusions.

Loading

Article metrics loading...

/content/journals/10.1146/annurev-statistics-040522-115238
2024-04-22
2024-04-30
Loading full text...

Full text loading...

/deliver/fulltext/statistics/11/1/annurev-statistics-040522-115238.html?itemId=/content/journals/10.1146/annurev-statistics-040522-115238&mimeType=html&fmt=ahah

Literature Cited

  1. Aamari E, Levrard C. 2018.. Stability and minimax optimality of tangential Delaunay complexes for manifold reconstruction. . Discrete Comput. Geom. 59:(4):92371
    [Crossref] [Google Scholar]
  2. Aamari E, Levrard C. 2019.. Nonasymptotic rates for manifold, tangent space and curvature estimation. . Ann. Stat. 47:(1):177204
    [Crossref] [Google Scholar]
  3. Altan E, Solla SA, Miller LE, Perreault EJ. 2021.. Estimating the dimensionality of the manifold underlying multi-electrode neural recordings. . PLOS Comput. Biol. https://doi.org/10.1371/journal.pcbi.1008591
    [Google Scholar]
  4. Arias-Castro E, Pelletier B. 2013.. On the convergence of maximum variance unfolding. . J. Mach. Learn. Res. 14::174770
    [Google Scholar]
  5. Assouad P. 1983.. Plongements lipschitziens dans {{r}}n. . Bull. Soc. Math. France 111::42948
    [Crossref] [Google Scholar]
  6. Aswani A, Bickel P, Tomlin C. 2011.. Regression on manifolds: estimation of the exterior derivative. . Ann. Stat. 39:(1):4881
    [Crossref] [Google Scholar]
  7. Belkin M, Niyogi P. 2003.. Laplacian Eigenmaps for dimensionality reduction and data representation. . Neural Comput. 15:(6):137396
    [Crossref] [Google Scholar]
  8. Belkin M, Niyogi P. 2007.. Convergence of Laplacian Eigenmaps. . In Advances in Neural Information Processing Systems 19: Proceedings of the 2006 Conference, ed. B Schölkopf, JC Platt, T Hoffman , pp. 12936. Cambridge, MA:: MIT Press
    [Google Scholar]
  9. Belkin M, Niyogi P, Sindhwani V. 2006.. Manifold regularization: a geometric framework for learning from labeled and unlabeled examples. . J. Mach. Learn. Res. 7:(85):2399434
    [Google Scholar]
  10. Bérard P, Besson G, Gallot S. 1994.. Embedding Riemannian manifolds by their heat kernel. . Geom. Funct. Anal. 4:(4):37398
    [Crossref] [Google Scholar]
  11. Berry T, Sauer T. 2019.. Consistent manifold representation for topological data analysis. . Found. Data Sci. 1:(1):138
    [Google Scholar]
  12. Böhm JN, Berens P, Kobak D. 2022.. Attraction-repulsion spectrum in neighbor embeddings. . J. Mach. Learn. Res. 23:(95):132
    [Google Scholar]
  13. Block A, Jia Z, Polyanskiy Y, Rakhlin A. 2022.. Intrinsic dimension estimation using Wasserstein distance. . J. Mach. Learn. Res. 23:(313):137
    [Google Scholar]
  14. Boninsegna L, Gobbo G, Noé F, Clementi C. 2015.. Investigating molecular kinetics by variationally optimized diffusion maps. . J. Chem. Theory Comput. 11:(12):594760
    [Crossref] [Google Scholar]
  15. Borovitskiy V, Terenin A, Mostowsky P, Deisenroth MP. 2020.. Matérn Gaussian processes on Riemannian manifolds. . In 34th Conference on Neural Information Processing Systems (NeurIPS 2020), ed. H Larochelle, M Ranzato, R Hadsell, M Balcan, H Lin , pp. 1242637. Red Hook, NY:: Curran
    [Google Scholar]
  16. Calder J, Trillos NG. 2022.. Improved spectral convergence rates for graph Laplacians on є-graphs and k-NN graphs. . Appl. Comput. Harmon. Anal. 60::12375
    [Crossref] [Google Scholar]
  17. Carreira-Perpiñan MA. 2010.. The elastic embedding algorithm for dimensionality reduction. . In ICML'10: Proceedings of the 27th International Conference on International Conference on Machine Learning, ed. J Fürnkranz, T Joachims , pp. 16774. Madison, WI:: Omnipress
    [Google Scholar]
  18. Ceriotti M, Tribello GA, Parrinello M. 2013.. Demonstrating the transferability and the descriptive power of sketch-map. . J. Chem. Theory Comput. 9:(3):152132
    [Crossref] [Google Scholar]
  19. Chatalic A, Schreuder N, Rosasco L, Rudi A. 2022.. Nyström kernel mean embeddings. . Proc. Mach. Learn. Res. 162::300624
    [Google Scholar]
  20. Chen G, Little AV, Maggioni M. 2013.. Multi-resolution geometric analysis for data in high dimensions. . In Excursions in Harmonic Analysis, Vol. 1, ed. TD Andrews, R Balan, JJ Benedetto, W Czaja, KA Okoudjou , pp. 25985. Boston:: Birkhäuser
    [Google Scholar]
  21. Chen L, Buja A. 2009.. Local multidimensional scaling for nonlinear dimension reduction, graph drawing and proximity analysis. . J. Am. Stat. Assoc. 104:(485):20919
    [Crossref] [Google Scholar]
  22. Chen YC, Genovese CR, Wasserman L. 2015.. Asymptotic theory for density ridges. . Ann. Stat. 43:(5):1896928
    [Crossref] [Google Scholar]
  23. Chen YC, Meilă M. 2019.. Selecting the independent coordinates of manifolds with large aspect ratios. . In Advances in Neural Information Processing Systems 32 (NeurIPS 2019), ed. H Wallach, H Larochelle, A Beygelzimer, F d'Alché-Buc, E Fox, R Garnett , pp. 108695. Red Hook, NY:: Curran
    [Google Scholar]
  24. Chmiela S, Tkatchenko A, Sauceda H, Poltavsky I, Schütt KT, Müller KR. 2017.. Machine learning of accurate energy-conserving molecular force fields. . Sci. Adv. 3::e1603015
    [Crossref] [Google Scholar]
  25. Coifman RR, Lafon S. 2006.. Diffusion Maps. . Appl. Comput. Harmon. Anal. 30:(1):530
    [Crossref] [Google Scholar]
  26. Connor M, Rozell C. 2016.. Unsupervised learning of manifold models for neural coding of physical transformations in the ventral visual pathway. Presented at Neural Information Processing Systems (NIPS) Workshop , Brains and Bits: Neuroscience Meets Machine Learning, Barcelona, Spain:, Dec. 9–10
  27. Costa J, Girotra A, Hero A. 2005.. Estimating local intrinsic dimension with k-nearest neighbor graphs. . In IEEE/SP 13th Workshop on Statistical Signal Processing, 2005, pp. 41722. Piscataway, NJ:: IEEE
    [Google Scholar]
  28. Cunningham JP, Yu BM. 2014.. Dimensionality reduction for large-scale neural recordings. . Nat. Neurosci. 16::15009
    [Crossref] [Google Scholar]
  29. Das P, Moll M, Stamati H, Kavraki L, Clementi C. 2006.. Low-dimensional, free-energy landscapes of protein-folding reactions by nonlinear dimensionality reduction. . Proc. Natl. Acad. Sci. 103:(26):988590
    [Crossref] [Google Scholar]
  30. Diaconis P, Goel S, Holmes S. 2008.. Horseshoes in multidimensional scaling and local kernel methods. . Ann. Appl. Stat. 2:(3):777807
    [Crossref] [Google Scholar]
  31. do Carmo M. 1992.. Riemannian Geometry. New York:: Springer
  32. Dsilva CJ, Talmon R, Coifman RR, Kevrekidis IG. 2018.. Parsimonious representation of nonlinear dynamical systems through manifold learning: a chemotaxis case study. . Appl. Comput. Harmon. Anal. 44:(3):75973
    [Crossref] [Google Scholar]
  33. Dsilva CJ, Talmon R, Gear CW, Coifman RR, Kevrekidis IG. 2016.. Data-driven reduction for a class of multiscale fast-slow stochastic dynamical systems. . SIAM J. Appl. Dyn. Syst. 15:(3):132751
    [Crossref] [Google Scholar]
  34. Falconer K. 2003.. Alternative definitions of dimension. . In Fractal Geometry: Mathematical Foundations and Applications, ed. K Falconer , pp. 3958. New York:: John Wiley & Sons
    [Google Scholar]
  35. Farahmand AM, Szepesvári C, Audibert JY. 2007.. Manifold-adaptive dimension estimation. . In Proceedings of the 24th International Conference on Machine Learning, ICML '07, pp. 26572. New York:: ACM
    [Google Scholar]
  36. Fefferman C, Mitter S, Narayanan H. 2016.. Testing the manifold hypothesis. . J. Am. Math. Soc. 29:(4):9831049
    [Crossref] [Google Scholar]
  37. García Trillos N, Gerlach M, Hein M, Slepčev D. 2020.. Error estimates for spectral convergence of the graph Laplacian on random geometric graphs toward the Laplace–Beltrami operator. . Found. Comput. Math. 20:(4):82787
    [Crossref] [Google Scholar]
  38. García Trillos N, Slepčev D. 2018.. A variational approach to the consistency of spectral clustering. . Appl. Comput. Harmon. Anal. 45:(2):23981
    [Crossref] [Google Scholar]
  39. Genovese CR, Perone-Pacifico M, Verdinelli I, Wasserman LA. 2012.. Minimax manifold estimation. . J. Mach. Learn. Res. 13::126391
    [Google Scholar]
  40. Giné E, Koltchinskii V. 2006.. Concentration inequalities and asymptotic results for ratio type empirical processes. . Ann. Probab. 34:(3):1143216
    [Crossref] [Google Scholar]
  41. Goldberg Y, Zakai A, Kushnir D, Ritov Y. 2008.. Manifold learning: the price of normalization. . J. Mach. Learn. Res. 9:(63):190939
    [Google Scholar]
  42. Goodfellow I, Bengio Y, Courville A. 2016.. Deep Learning. Cambridge, MA:: MIT Press
  43. Grassberger P, Procaccia I. 1983.. Measuring the strangeness of strange attractors. . Phys. D Nonlinear Phenom. 9:(1):189208
    [Crossref] [Google Scholar]
  44. Hein M, Audibert J, von Luxburg U. 2007.. Graph Laplacians and their convergence on random neighborhood graphs. . J. Mach. Learn. Res. 8::132568
    [Google Scholar]
  45. Herring CA, Banerjee A, McKinley ET, Simmons AJ, Ping J, et al. 2018.. Unsupervised trajectory analysis of single-cell RNA-seq and imaging data reveals alternative tuft cell origins in the gut. . Cell Syst. 6:(1):3751.e9
    [Crossref] [Google Scholar]
  46. Hinton GE, Roweis S. 2002.. Stochastic neighbor embedding. . In Advances in Neural Information Processing Systems 15 (NIPS 2002), ed. S Becker, S Thrun, K Obermayer , pp. 85764. Cambridge, MA:: MIT Press
    [Google Scholar]
  47. Im DJ, Verma N, Branson K. 2018.. Stochastic neighbor embedding under f-divergences. . arXiv:1811.01247 [cs.LG]
  48. Isayev O, Fourches D, Muratov EN, Oses C, Rasch K, et al. 2015.. Materials cartography: representing and mining materials space using structural and electronic fingerprints. . Chem. Mater. (27):73543
    [Crossref] [Google Scholar]
  49. Jacomy M, Venturini T, Heymann S, Bastian M. 2014.. ForceAtlas2, a continuous graph layout algorithm for handy network visualization designed for the Gephi software. . PLOS ONE 9:(6):e98679
    [Crossref] [Google Scholar]
  50. Jolliffe IT. 2002.. Principal Component Analysis. New York:: Springer
  51. Joncas D, Meilă M, McQueen J. 2017.. Improved graph Laplacian via geometric self-consistency. . In Advances in Neural Information Processing Systems 30 (NIPS 2017), ed. I Guyon, UV Luxburg, S Bengio, H Wallach, R Fergus, S Vishwanathan, R Garnett , pp. 445766. Red Hook, NY:: Curran
    [Google Scholar]
  52. Kim J, Rinaldo A, Wasserman LA. 2019.. Minimax rates for estimating the dimension of a manifold. . J. Comput. Geom. 10:(1):4295
    [Google Scholar]
  53. Kirichenko A, van Zanten H. 2017.. Estimating a smooth function on a large graph by Bayesian Laplacian regularisation. . Electron. J. Stat. 11:(1):891915
    [Crossref] [Google Scholar]
  54. Kleindessner M, von Luxburg U. 2015.. Dimensionality estimation without distances. . J. Mach. Learn. Res. 38::47179
    [Google Scholar]
  55. Kobak D, Linderman G, Steinerberger S, Kluger Y, Berens P. 2020.. Heavy-tailed kernels reveal a finer cluster structure in t-SNE visualisations. . In Machine Learning and Knowledge Discovery in Databases, ed. U Brefeld, E Fromont, A Hotho, A Knobbe, M Maathuis, C Robardet , pp. 12439. New York:: Springer
    [Google Scholar]
  56. Koelle SJ, Zhang H, Meilă M, Chen YC. 2022.. Manifold coordinates with physical meaning. . J. Mach. Learn. Res. 23:(133):157
    [Google Scholar]
  57. Kohli D, Cloninger A, Mishne G. 2021.. LDLE: low distortion local eigenmaps. . J. Mach. Learn. Res. 22:(282):164
    [Google Scholar]
  58. Koltchinskii VI. 2000.. Empirical geometry of multivariate data: a deconvolution approach. . Ann. Stat. 28:(2):591629
    [Crossref] [Google Scholar]
  59. Lee JM. 2003.. Introduction to Smooth Manifolds. New York:: Springer-Verlag
  60. Levina E, Bickel PJ. 2004.. Maximum likelihood estimation of intrinsic dimension. . In Advances in Neural Information Processing Systems 17 (NIPS 2004), ed. L Saul, Y Weiss, L Bottou , pp. 77784. Red Hook, NY:: Curran
    [Google Scholar]
  61. Lin B, He X, Zhang C, Ji M. 2013.. Parallel vector field embedding. . J. Mach. Learn. Res. 14:(90):294577
    [Google Scholar]
  62. Linderman GC, Steinerberger S. 2019.. Clustering with t-SNE, provably. . SIAM J. Math. Data Sci. 1:(2):31332
    [Crossref] [Google Scholar]
  63. Luo C, Safa I, Wang Y. 2009.. Approximating gradients for meshes and point clouds via diffusion metric. . Comput. Graph. Forum 28:(5):1497508
    [Crossref] [Google Scholar]
  64. McInnes L, Healy J, Saul N, Grossberger L. 2018.. UMAP: uniform manifold approximation and projection. . J. Open Source Softw. 3:(29):861
    [Crossref] [Google Scholar]
  65. McQueen J, Meilă M, Joncas D. 2016.. Nearly isometric embedding by relaxation. . In Advances in Neural Information Processing Systems 29 (NIPS 2016), ed. D Lee, M Sugiyama, U Luxburg, I Guyon, R Garnett , pp. 263139. Red Hook, NY:: Curran
    [Google Scholar]
  66. McQueen J, Meilă M, VanderPlas J, Zhang Z. 2016a.. Megaman: manifold learning with millions of points. . arXiv:1603.02763 [cs.LG]
  67. McQueen J, Meilă M, VanderPlas J, Zhang Z. 2016b.. Megaman: scalable manifold learning in Python. . J. Mach. Learn. Res. 17:(148):15
    [Google Scholar]
  68. Meilă M. 2015.. Spectral clustering. . In Handbook of Cluster Analysis, ed. C Hennig, M Meilă, F Murtagh, R Rocci , pp. 12539. New York:: Chapman and Hall/CRC
    [Google Scholar]
  69. Meilă M, Shi J. 2001.. A random walks view of spectral segmentation. . Proc. Mach. Learn. Res. R3::2038
    [Google Scholar]
  70. Meilă M, Zhang H. 2023.. Manifold learning: what, how, and why. . arXiv:2311.03757 [stat.ML]
  71. Mohammed K, Narayanan H. 2017.. Manifold learning using kernel density estimation and local principal components analysis. . arXiv:1709.03615 [math.ST]
  72. Nadler B, Lafon S, Coifman R, Kevrekidis I. 2006.. Diffusion Maps, spectral clustering and eigenfunctions of Fokker-Planck operators. . In Advances in Neural Information Processing Systems 18 (NIPS 2005), ed. Y Weiss, B Schölkopf, J Platt , pp. 95562. Cambridge, MA:: MIT Press
    [Google Scholar]
  73. Ng A, Jordan M, Weiss Y. 2001.. On spectral clustering: Analysis and an algorithm. . In Advances in Neural Information Processing Systems 14 (NIPS 2001), ed. T Dietterich, S Becker, Z Ghahramani , pp. 84956. Cambridge, MA:: MIT Press
    [Google Scholar]
  74. Noé F, Clementi C. 2017.. Collective variables for the study of long-time kinetics from molecular trajectories: theory and methods. . Curr. Opin. Struct. Biol. 43::14147
    [Crossref] [Google Scholar]
  75. Ozertem U, Erdogmus D. 2011.. Locally defined principal curves and surfaces. . J. Mach. Learn. Res. 12:(34):124986
    [Google Scholar]
  76. Perrault-Joncas D, Meilă M. 2013.. Non-linear dimensionality reduction: Riemannian metric estimation and the problem of geometric discovery. . arXiv:1305.7255 [stat.ML]
  77. Perrault-Joncas D, Meilă M, McQueen J. 2017.. Improved graph Laplacian via geometric self-consistency. . In NIPS'17: Proceedings of the 31st International Conference on Neural Information Processing Systems, ed. U von Luxburg, I Guyon, S Bengio, H Wallach, R Fergus , pp. 445766. Red Hook, NY:: Curran
    [Google Scholar]
  78. Pettis KW, Bailey TA, Jain AK, Dubes RC. 1979.. An intrinsic dimensionality estimator from near-neighbor information. . IEEE Trans. Pattern Anal. Mach. Intell. 1:(1):2537
    [Crossref] [Google Scholar]
  79. Poličar PG, Stražar M, Zupan B. 2019.. opentSNE: a modular Python library for t-SNE dimensionality reduction and embedding. . bioRxiv 731877. https://doi.org/10.1101/731877
  80. Portegies JW. 2016.. Embeddings of Riemannian manifolds with heat kernels and eigenfunctions. . Commun. Pure Appl. Math. 69:(3):478518
    [Crossref] [Google Scholar]
  81. Ram P, Lee D, March W, Gray A. 2009.. Linear-time algorithms for pairwise statistical problems. . In Advances in Neural Information Processing Systems 22 (NIPS 2009), ed. Y Bengio, D Schuurmans, J Lafferty, C Williams, A Culotta , pp. 152735. Red Hook, NY:: Curran
    [Google Scholar]
  82. Rohrdanz MA, Zheng W, Maggioni M, Clementi C. 2011.. Determination of reaction coordinates via locally scaled diffusion map. . J. Chem. Phys. 134:(12):124116
    [Crossref] [Google Scholar]
  83. Rosenberg S. 1997.. The Laplacian on a Riemannian Manifold: An Introduction to Analysis on Manifolds. Cambridge, UK:: Cambridge Univ. Press
  84. Roweis S, Saul L. 2000.. Nonlinear dimensionality reduction by locally linear embedding. . Science 290:(5500):232326
    [Crossref] [Google Scholar]
  85. Sha F, Saul LK. 2005.. Analysis and Extension of Spectral Methods for Nonlinear Dimensionality Reduction (ICML'05). New York:: ACM
  86. Shi J, Malik J. 2000.. Normalized cuts and image segmentation. . IEEE Trans. Pattern Anal. Mach. Intell. 22:(8):888905
    [Crossref] [Google Scholar]
  87. Singer A. 2006.. From graph to manifold Laplacian: the convergence rate. . Appl. Comput. Harmon. Anal. 21:(1):12834
    [Crossref] [Google Scholar]
  88. Singer A, Wu HT. 2012.. Vector Diffusion Maps and the connection Laplacian. . Commun. Pure Appl. Math. 65:(8):1067144
    [Crossref] [Google Scholar]
  89. Slepčev D, Thorpe M. 2019.. Analysis of p-Laplacian regularization in semisupervised learning. . SIAM J. Math. Anal. 51:(3):2085120
    [Crossref] [Google Scholar]
  90. Sogge CD. 2014.. Hangzhou Lectures on Eigenfunctions of the Laplacian. Princeton, NJ:: Princeton Univ. Press
  91. Tenenbaum JB, de Silva V, Langford JC. 2000.. A global geometric framework for nonlinear dimensionality reduction. . Science 290:(5500):231923
    [Crossref] [Google Scholar]
  92. Ting D, Huang L, Jordan MI. 2010.. An analysis of the convergence of graph Laplacians. . In ICML'10: Proceedings of the 27th International Conference on Machine Learning, ed. J Fürnkranz, T Joachims , pp. 107986. Madison, WI:: Omnipress
    [Google Scholar]
  93. Ting D, Jordan MI. 2018.. On nonlinear dimensionality reduction, linear smoothing and autoencoding. . arXiv:1803.02432 [stat.ML]
  94. Ting D, Jordan MI. 2020.. Manifold learning via manifold deflation. . arXiv:2007.03315 [stat.ML]
  95. Tribello GA, Ceriotti M, Parrinello M. 2012.. Using sketch-map coordinates to analyze and bias molecular dynamics simulations. . Proc. Natl. Acad. Sci. 109::5196201
    [Crossref] [Google Scholar]
  96. van der Maaten L. 2014.. Accelerating t-SNE using tree-based algorithms. . J. Mach. Learn. Res. 15:(93):322145
    [Google Scholar]
  97. van der Maaten L, Hinton G. 2008.. Visualizing data using t-SNE. . J. Mach. Learn. Res. 9::2579605
    [Google Scholar]
  98. Vanderplas J, Connolly A. 2009.. Reducing the dimensionality of data: locally linear embedding of Sloan galaxy spectra. . Astron. J. 138:(5):1365
    [Crossref] [Google Scholar]
  99. Verma N. 2011.. Towards an algorithmic realization of Nash's embedding theorem. Work. Pap. , Univ. Calif., San Diego:
  100. von Luxburg U. 2007.. A tutorial on spectral clustering. . Stat. Comput. 17:(4):395416
    [Crossref] [Google Scholar]
  101. Wasserman L. 2018.. Topological data analysis. . Annu. Rev. Stat. Appl. 5::50132
    [Crossref] [Google Scholar]
  102. Weinberger KQ, Saul LK. 2006.. An introduction to nonlinear dimensionality reduction by maximum variance unfolding. . In Proceedings of the Twenty-First National Conference on Artificial Intelligence and the Eighteenth Innovative Applications of Artificial Intelligence Conference, pp. 168386. Washington, DC:: AAAI Press
    [Google Scholar]
  103. Yu K, Zhang T. 2010.. Improved local coordinate coding using local tangents. . In ICML'10: Proceedings of the 27th International Conference on International Conference on Machine Learning, ed. J Fürnkranz, T Joachims , pp. 121522. Madison, WI:: Omnipress
    [Google Scholar]
  104. Zhang Y, Gilbert AC, Steinerberger S. 2022.. May the force be with you. . In 58th Annual Allerton Conference on Communication, Control, and Computing, pp. 18. Piscataway, NJ:: IEEE
    [Google Scholar]
  105. Zhang Y, Steinerberger S. 2022.. t-SNE, forceful colorings and mean field limits. . Res. Math. Sci. 9::42
    [Crossref] [Google Scholar]
  106. Zhang Z, Zha H. 2004.. Principal manifolds and nonlinear dimensionality reduction via tangent space alignment. . SIAM J. Sci. Comput. 26:(1):31338
    [Crossref] [Google Scholar]
/content/journals/10.1146/annurev-statistics-040522-115238
Loading
/content/journals/10.1146/annurev-statistics-040522-115238
Loading

Data & Media loading...

Supplemental Material

Supplemental Material

  • Article Type: Review Article
This is a required field
Please enter a valid email address
Approval was a Success
Invalid data
An Error Occurred
Approval was partially successful, following selected items could not be processed due to error