1932

Abstract

Phonetics is the scientific field concerned with the study of how speech is produced, heard, and perceived. It abounds with data, such as acoustic speech recordings, neuroimaging data, and articulatory data. In this article, we provide an introduction to different areas of phonetics (acoustic phonetics, sociophonetics, speech perception, articulatory phonetics, speech inversion, sound change, and speech technology), an overview of the statistical methods for analyzing their data, and an introduction to the signal processing methods commonly applied to speech recordings. A major transition in the statistical modeling of phonetic data has been the shift from fixed effects to random effects regression models, the modeling of curve data (for instance, via generalized additive mixed models or functional data analysis methods), and the use of Bayesian methods. This shift has been driven in part by the increased focus on large speech corpora in phonetics, which has arisen from machine learning methods such as forced alignment. We conclude by identifying opportunities for future research.

Loading

Article metrics loading...

/content/journals/10.1146/annurev-statistics-112723-034642
2025-03-07
2025-04-25
Loading full text...

Full text loading...

/deliver/fulltext/statistics/12/1/annurev-statistics-112723-034642.html?itemId=/content/journals/10.1146/annurev-statistics-112723-034642&mimeType=html&fmt=ahah

Literature Cited

  1. Anumanchipalli GK, Chartier J, Chang EF. 2019.. Speech synthesis from neural decoding of spoken sentences. . Nature 568:(7753):49398
    [Crossref] [Google Scholar]
  2. Aston JAD, Chiou JM, Evans JP. 2010.. Linguistic pitch analysis using functional principal component mixed effect models. . J. R. Stat. Soc. C 59:(2):297317
    [Crossref] [Google Scholar]
  3. Atal B, Chang JJ, Mathews MV, Tukey JW. 1978.. Inversion of articulatory-to-acoustic transformation in the vocal tract by a computer-sorting technique. . J. Acoust. Soc. Am. 63:(5):153553
    [Crossref] [Google Scholar]
  4. Atal B, Schroeder MR. 1978.. Predictive coding of speech signals and subjective error criteria. . In Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing, pp. 57376. Piscataway, NJ:: IEEE
    [Google Scholar]
  5. Baayen RH. 2008.. Analyzing Linguistic Data: A Practical Introduction to Statistics Using R. Cambridge, UK:: Cambridge Univ. Press
    [Google Scholar]
  6. Baayen RH, Davidson D, Bates D. 2008.. Mixed-effects modeling with crossed random effects for subjects and items. . J. Mem. Lang. 59:(4):390412
    [Crossref] [Google Scholar]
  7. Bai R, Moran GE, Antonelli JL, Chen Y, Boland MR. 2022.. Spike-and-slab group lassos for grouped regression and sparse generalized additive models. . J. Am. Stat. Assoc. 117:(537):18497
    [Crossref] [Google Scholar]
  8. Baker A, Archangeli D, Mielke J. 2011.. Variability in American English s-retraction suggests a solution to the actuation problem. . Lang. Var. Change 23:(3):34774
    [Crossref] [Google Scholar]
  9. Barr DJ. 2008.. Analyzing `visual world' eyetracking data using multilevel logistic regression. . J. Mem. Lang. 59:(4):45774
    [Crossref] [Google Scholar]
  10. Barreda S. 2023.. phonTools: functions for phonetics in R. R Package. , version 0.2-2.2. https://cran.r-project.org/web/packages/phonTools/
  11. Bates D, Mächler M, Bolker B, Walker S. 2015.. Fitting linear mixed-effects models using lme4. . J. Stat. Softw. 67:(1):148
    [Crossref] [Google Scholar]
  12. Beautemps D, Badin P, Laboissière R. 1995.. Deriving vocal-tract area functions from midsagittal profiles and formant frequencies: a new model for vowels and fricative consonants based on experimental data. . Speech Commun. 16::2747
    [Crossref] [Google Scholar]
  13. Beddor PS. 2023.. Advancements of phonetics in the 21st century: theoretical and empirical issues in the phonetics of sound change. . J. Phon. 98::101228
    [Crossref] [Google Scholar]
  14. Beguš G. 2021.. ciwGAN and fiwGAN: encoding information in acoustic data to model lexical learning with generative adversarial networks. . Neural Netw. 139::30525
    [Crossref] [Google Scholar]
  15. Boersma P. 1993.. Accurate short-term analysis of the fundamental frequency and the harmonics-to-noise ratio of a sampled sound. . Proc. Inst. Phon. Sci. 17::97110
    [Google Scholar]
  16. Boersma P, Weenink D. 2023.. Praat: doing phonetics by computer. . Phonetics Software, version 6.3.17. http://www.praat.org
    [Google Scholar]
  17. Bogert BP, Healy MJR, Tukey JW. 1963.. The quefrency alanysis of time series for echoes: cepstrum, pseudoautocovariance, cross-cepstrum and saphe cracking. . In Proceedings of the Symposium in Time Series Analysis, 1962, ed. M Rosenblatt , pp. 20943. New York:: Wiley
    [Google Scholar]
  18. Brand J, Hay J, Clark L, Watson K, Sóskuthy M. 2021.. Systematic co-variation of monophthongs across speakers of New Zealand English. . J. Phon. 88::101096
    [Crossref] [Google Scholar]
  19. Browman CP, Goldstein L. 1991.. Gestural structures: distinctiveness, phonological processes, and historical change. . In Modularity and the Motor Theory of Speech Perception, ed. IG Mattingly, M Studdert-Kennedy , pp. 31338. New York:: Psychology
    [Google Scholar]
  20. Bürkner PC. 2017.. brms: an R package for Bayesian multilevel models using Stan. . J. Stat. Softw. 80:(1):128
    [Crossref] [Google Scholar]
  21. Bybee J. 2001.. Phonology and Language Use. Cambridge, UK:: Cambridge Univ. Press
    [Google Scholar]
  22. Carignan C, Coretta S, Frahm J, Harrington J, Hoole P, et al. 2021.. Planting the seed for sound change: evidence from real-time MRI of velum kinematics in German. . Language 97:(2):33364
    [Crossref] [Google Scholar]
  23. Carignan C, Hoole P, Kunay E, Pouplier M, Joseph A, et al. 2020.. Analyzing speech in both time and space: Generalized additive mixed models can uncover systematic patterns of variation in vocal tract shape in real-time MRI. . Lab. Phonol. 11:(1):2
    [Crossref] [Google Scholar]
  24. Carpenter B, Gelman A, Hoffman M, Lee D, Goodrich B, et al. 2016.. Stan: a probabilistic programming language. . J. Stat. Softw. 76::132
    [Google Scholar]
  25. Cederbaum J, Pouplier M, Hoole P, Greven S. 2016.. Functional linear mixed models for irregularly or sparsely sampled data. . Stat. Model. 16:(1):6788
    [Crossref] [Google Scholar]
  26. Cedergren HJ, Sankoff D. 1974.. Variable rules: performance as a statistical reflection of competence. . Language 50:(2):33355
    [Crossref] [Google Scholar]
  27. Cho T, Whalen D, Docherty G. 2019.. Voice onset time and beyond: exploring laryngeal contrast in 19 languages. . J. Phon. 72::5265
    [Crossref] [Google Scholar]
  28. Chodroff E, Wilson C. 2022.. Uniformity in phonetic realization: evidence from sibilant place of articulation in American English. . Language 98:(2):25089
    [Crossref] [Google Scholar]
  29. Chuang YY, Fon J, Papakyritsis I, Baayen H. 2021.. Analyzing phonetic data with generalized additive mixed models. . In Manual of Clinical Phonetics, ed. MJ Ball , pp. 10838. Abingdon-on-Thames, UK:: Routledge
    [Google Scholar]
  30. Coleman J. 2005.. Introducing Speech and Language Processing. Cambridge, UK:: Cambridge Univ. Press
    [Google Scholar]
  31. Coleman J, Aston JAD, Pigoli D. 2015.. Reconstructing the sounds of words from the past. Paper presented at the 18th International Congress of Phonetic Sciences, Glasgow, UK:, Aug. 10–14
    [Google Scholar]
  32. Cooley JW, Tukey JW. 1965.. An algorithm for the machine calculation of complex Fourier series. . Math. Comput. 19:(90):297301
    [Crossref] [Google Scholar]
  33. Davidson L. 2006.. Comparing tongue shapes from ultrasound imaging using smoothing spline analysis of variance. . J. Acoust. Soc. Am. 120:(1):40715
    [Crossref] [Google Scholar]
  34. de Boer B. 2001.. The Origins of Vowel Systems. Oxford, UK:: Oxford Univ. Press
    [Google Scholar]
  35. Delattre PC, Liberman AM, Cooper FS. 1955.. Acoustic loci and transitional cues for consonants. . J. Acoust. Soc. Am. 27:(4):76973
    [Crossref] [Google Scholar]
  36. Eager C, Roy J. 2017.. Mixed effects models are sometimes terrible. . arXiv:1701.04858 [stat.AP ]
  37. Eckert P. 2012.. Three waves of variation study: the emergence of meaning in the study of sociolinguistic variation. . Annu. Rev. Anthropol. 41::87100
    [Crossref] [Google Scholar]
  38. Erro D, Sainz I, Navas E, Hernaez I. 2014.. Harmonics plus noise model based vocoder for statistical parametric speech synthesis. . IEEE J. Sel. Top. Signal Process. 8:(2):18494
    [Crossref] [Google Scholar]
  39. Evans J, Chu M, Aston JA, Su C. 2010.. Linguistic and human effects on F0 in a tonal dialect of Qiang. . Phonetica 67:(1/2):8299
    [Crossref] [Google Scholar]
  40. Fant CGM. 1960.. Acoustic Theory of Speech Production. The Hague, Neth.:: Mouton
    [Google Scholar]
  41. Fant CGM. 1972.. Vocal tract wall effects, losses, and resonance bandwidths. . Speech Transm. Lab. Q. Prog. Status Rep. 13:(2/3):2852
    [Google Scholar]
  42. Fant CGM. 1980.. The relations between area functions and the acoustic signal. . Phonetica 37::5586
    [Crossref] [Google Scholar]
  43. Frossard J, Renaud O. 2022.. The cluster depth tests: toward point-wise strong control of the family-wise error rate in massively univariate tests with application to M/EEG. . NeuroImage 247::118824
    [Crossref] [Google Scholar]
  44. Fruehwald J. 2017.. Generations, lifespans, and the zeitgeist. . Lang. Var. Change 29:(1):127
    [Crossref] [Google Scholar]
  45. Functional Phylogenies Group. 2012.. Phylogenetic inference for function-valued traits: speech sound evolution. . Trends Ecol. Evol. 27:(3):16066
    [Crossref] [Google Scholar]
  46. Garofolo JS, Lamel LF, Fisher WM, Fiscus JG, Pallett DS, et al. 1993.. TIMIT acoustic-phonetic continuous speech corpus LDC93S1. Speech Corpus Recording Database, Linguist. Data Consort., Philadelphia:. https://doi.org/10.35111/17gk-bn40
    [Google Scholar]
  47. Garrett A, Johnson K. 2013.. Phonetic bias in sound change. . In Origins of Sound Change: Approaches to Phonologization, ed. ACL Yu , pp. 5197. Oxford, UK:: Oxford Univ. Press
    [Google Scholar]
  48. Gelman A, Carlin JB, Stern HS, Dunson DB, Vehtari A, Rubin DB. 2013.. Bayesian Data Analysis. Boca Raton, FL:: CRC. , 3rd ed..
    [Google Scholar]
  49. Ghosh PK, Narayanan SS. 2011.. A subject-independent acoustic-to-articulatory inversion. . In 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 462427. Piscataway, NJ:: IEEE
    [Google Scholar]
  50. Gold B, Morgan N, Ellis D. 2011.. Speech and Audio Signal Processing: Processing and Perception of Speech and Music. New York:: Wiley. , 2nd ed..
    [Google Scholar]
  51. Grabe E, Kochanski G, Coleman J. 2007.. Connecting intonation labels to mathematical descriptions of fundamental frequency. . Lang. Speech 50:(3):281310
    [Crossref] [Google Scholar]
  52. Groppe DM, Urbach TP, Kutas M. 2011.. Mass univariate analysis of event-related brain potentials/fields. I: A critical tutorial review. . Psychophysiology 48:(12):171125
    [Crossref] [Google Scholar]
  53. Gubian M, Cronenberg J, Harrington J. 2023.. Phonetic and phonological sound changes in an agent-based model. . Speech Commun. 147::93115
    [Crossref] [Google Scholar]
  54. Gubian M, Torreira F, Boves L. 2015.. Using functional data analysis for investigating multidimensional dynamic phonetic contrasts. . J. Phon. 49::1640
    [Crossref] [Google Scholar]
  55. Guo W. 2002.. Functional mixed effects models. . Biometrics 58:(1):12128
    [Crossref] [Google Scholar]
  56. Hadjipantelis PZ, Aston JA, Müller HG, Evans JP. 2015.. Unifying amplitude and phase analysis: a compositional data approach to functional multivariate mixed-effects modeling of Mandarin Chinese. . J. Am. Stat. Assoc. 110:(510):54559
    [Crossref] [Google Scholar]
  57. Hagiwara R. 1997.. Dialect variation and formant frequency: the American English vowels revisited. . J. Acoust. Soc. Am. 102:(1):65558
    [Crossref] [Google Scholar]
  58. Harrington J. 2010.. Phonetic Analysis of Speech Corpora. Hoboken, NJ:: Wiley-Blackwell
    [Google Scholar]
  59. Harshman R, Ladefoged P, Goldstein L. 1977.. Factor analysis of tongue shapes. . J. Acoust. Soc. Am. 62:(3):693707
    [Crossref] [Google Scholar]
  60. Hastie T, Tibshirani R, Friedman JH. 2009.. The Elements of Statistical Learning: Data Mining, Inference, and Prediction. New York:: Springer. , 2nd ed..
    [Google Scholar]
  61. Hillenbrand J, Getty LA, Clark MJ, Wheeler K. 1995.. Acoustic characteristics of American English vowels. . J. Acoust. Soc. Am. 97:(5):3099111
    [Crossref] [Google Scholar]
  62. Hinton G, Deng L, Yu D, Dahl GE, Mohamed A, et al. 2012.. Deep neural networks for acoustic modeling in speech recognition: the shared views of four research groups. . IEEE Signal Process. Mag. 29:(6):8297
    [Crossref] [Google Scholar]
  63. Hiroya S, Honda M. 2004.. Estimation of articulatory movements from speech acoustics using an HMM-based speech production model. . IEEE Trans. Speech Audio Process. 12:(2):17585
    [Crossref] [Google Scholar]
  64. Hudson T, Wei J, Coleman J. 2024.. Using acoustic-phonetic simulations to model historical sound change. . Diachronica 41:(3):35578
    [Crossref] [Google Scholar]
  65. Jäger G. 2019.. Computational historical linguistics. . Theor. Linguist. 45:(3–4):15182
    [Crossref] [Google Scholar]
  66. Johnson K. 2012.. Acoustic and Auditory Phonetics. Hoboken, NJ:: Wiley-Blackwell. , 3rd ed..
    [Google Scholar]
  67. Johnson K, Sjerps MJ. 2021.. Speaker normalization in speech perception. . In The Handbook of Speech Perception, ed. JS Pardo, LC Nygaard, RE Remez, DB Pisoni , pp. 14576. New York:: Wiley. , 2nd ed..
    [Google Scholar]
  68. Kaan E. 2007.. Event-related potentials and language processing: a brief overview. . Lang. Linguist. Compass 1:(6):57191
    [Crossref] [Google Scholar]
  69. Kaipio J, Somersalo E. 2005.. Statistical and Computational Inverse Problems. New York:: Springer
    [Google Scholar]
  70. Kawahara H, Morise M, Takahashi T, Nisimura R, Irino T, Banno H. 2008.. Tandem-STRAIGHT: a temporally stable power spectral representation for periodic signals and applications to interference-free spectrum, F0, and aperiodicity estimation. . In 2008 IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 393336. Piscataway, NJ:: IEEE
    [Google Scholar]
  71. Kendall T, Pharao N, Stuart-Smith J, Vaughn C. 2023.. Advancements of phonetics in the 21st century: theoretical issues in sociophonetics. . J. Phon. 98::101226
    [Crossref] [Google Scholar]
  72. Kirby J, Sonderegger M. 2013.. A model of population dynamics applied to phonetic change. . In Proceedings of the 35th Annual Conference of the Cognitive Science Society, pp. 77681. Seattle, WA:: Cogn. Sci. Soc.
    [Google Scholar]
  73. Kiritani S, Itoh K, Fujimura O. 1975.. Tongue-pellet tracking by a computer-controlled X-ray microbeam system. . J. Acoust. Soc. Am. 57:(6):151620
    [Crossref] [Google Scholar]
  74. Kisler T, Reichel U, Schiel F. 2017.. Multilingual processing of speech via web services. . Comput. Speech Lang. 45::32647
    [Crossref] [Google Scholar]
  75. Klatt DH, Klatt LC. 1990.. Analysis, synthesis, and perception of voice quality variations among female and male talkers. . J. Acoust. Soc. Am. 87:(2):82057
    [Crossref] [Google Scholar]
  76. Koenig LL, Lucero JC, Perlman E. 2008.. Speech production variability in fricatives of children and adults: results of functional data analysis. . J. Acoust. Soc. Am. 124:(5):315870
    [Crossref] [Google Scholar]
  77. Kohler KJ. 1987.. Categorical pitch perception. . In Proceedings of the XIth International Congress of Phonetic Sciences, Vol. 5, pp. 33133. London:: Int. Phon. Assoc.
    [Google Scholar]
  78. Koshy A, Tavakoli S. 2022.. Exploring British accents: modelling the trap–bath split with functional data analysis. . J. R. Stat. Soc. C 71:(4):773805
    [Crossref] [Google Scholar]
  79. Labov W. 2001.. Principles of Linguistic Change, Vol. 2: Social Factors. Malden, MA:: Blackwell
    [Google Scholar]
  80. Lai RCS, Huang HC, Lee TCM. 2012.. Fixed and random effects selection in nonparametric additive mixed models. . Electron. J. Stat. 6::81042
    [Crossref] [Google Scholar]
  81. Larar JN, Schroeter J, Sondhi MM. 1988.. Vector quantization of the articulatory space. . IEEE Trans. Acoust. Speech Signal Process. 36:(12):181218
    [Crossref] [Google Scholar]
  82. Liberman MY. 2019.. Corpus phonetics. . Annu. Rev. Linguist. 5::91107
    [Crossref] [Google Scholar]
  83. Lim Y, Toutios A, Bliesener Y, Tian Y, Lingala SG, et al. 2021.. A multispeaker dataset of raw and reconstructed speech production real-time MRI video and 3D volumetric images. . Sci. Data 8::187
    [Crossref] [Google Scholar]
  84. Lisker L, Abramson AS. 1964.. A cross-language study of voicing in initial stops: acoustical measurements. . Word 20:(3):384422
    [Crossref] [Google Scholar]
  85. Lisker L, Abramson AS. 1970.. The voicing dimension: some experiments in comparative phonetics. . In Proceedings of the 6th International Congress of Phonetic Sciences, ed. B Hála, M Romportl, P Janota , pp. 56365. London:: Int. Phon. Assoc.
    [Google Scholar]
  86. Lobanov BM. 1971.. Classification of Russian vowels spoken by different speakers. . J. Acoust. Soc. Am. 49:(2B):6068
    [Crossref] [Google Scholar]
  87. Lucero JC, Koenig LL. 2000.. Time normalization of voice signals using functional data analysis. . J. Acoust. Soc. Am. 108:(4):140820
    [Crossref] [Google Scholar]
  88. Malinen J, Palo P. 2009.. Recording speech during MRI: part II. . In Models and Analysis of Vocal Emissions for Biomedical Applications: 6th International Workshop, ed. C Manfredi , pp. 21114. Firenze, Italy:: Firenze Univ. Press
    [Google Scholar]
  89. Marra G, Wood SN. 2011.. Practical variable selection for generalized additive models. . Comput. Stat. Data Anal. 55:(7):237287
    [Crossref] [Google Scholar]
  90. Marron JS, Ramsay JO, Sangalli LM, Srivastava A. 2015.. Functional data analysis of amplitude and phase variation. . Stat. Sci. 30:(4):46884
    [Crossref] [Google Scholar]
  91. McAuliffe M, Socolof M, Mihuc S, Wagner M, Sonderegger M. 2017.. Montreal forced aligner: trainable text-speech alignment using Kaldi. . In Proceedings of INTERSPEECH 2017, pp. 498502. N.p.:: Int. Speech Commun. Assoc.
    [Google Scholar]
  92. Mermelstein P. 1967.. Determination of the vocal-tract shape from measured formant frequencies. . J. Acoust. Soc. Am. 41:(5):128394
    [Crossref] [Google Scholar]
  93. Mielke J, Carignan C, Thomas E. 2017.. The articulatory dynamics of pre-velar and pre-nasal /æ/-raising in English: an ultrasound study. . J. Acoust. Soc. Am. 142:(1):33249
    [Crossref] [Google Scholar]
  94. Mirman D, Dixon JA, Magnuson JS. 2008.. Statistical and computational models of the visual world paradigm: growth curves and individual differences. . J. Mem. Lang. 59:(4):47594
    [Crossref] [Google Scholar]
  95. Mohamed A, Lee H, Borgholt L, Havtorn JD, Edin J, et al. 2022.. Self-supervised speech representation learning: a review. . IEEE J. Sel. Top. Signal Process. 16:(6):1179210
    [Crossref] [Google Scholar]
  96. Moisik S, Lin H, Esling J. 2014.. A study of laryngeal gestures in Mandarin citation tones using simultaneous laryngoscopy and laryngeal ultrasound (SLLUS). . J. Int. Phon. Assoc. 44:(1):2158
    [Crossref] [Google Scholar]
  97. Mrayati M, Carré R, Guérin B. 1988.. Distinctive regions and modes: a new theory of speech production. . Speech Commun. 7:(3):25786
    [Crossref] [Google Scholar]
  98. Murphy KP. 2022.. Probabilistic Machine Learning: An Introduction. Cambridge, MA:: MIT Press
    [Google Scholar]
  99. Murphy KP. 2023.. Probabilistic Machine Learning: Advanced Topics. Cambridge, MA:: MIT Press
    [Google Scholar]
  100. Näätänen R, Gaillard AW, Mäntysalo S. 1978.. Early selective-attention effect on evoked potential reinterpreted. . Acta Psychol. 42:(4):31329
    [Crossref] [Google Scholar]
  101. Nixon JS, Van Rij J, Mok P, Baayen RH, Chen Y. 2016.. The temporal dynamics of perceptual uncertainty: eye movement evidence from Cantonese segment and tone perception. . J. Mem. Lang. 90::10325
    [Crossref] [Google Scholar]
  102. Niyogi P. 2006.. The Computational Nature of Language Learning and Evolution. Cambridge, MA:: MIT Press
    [Google Scholar]
  103. Ohala JJ. 1993.. Sound change as nature's speech perception experiment. . Speech Commun. 13:(1/2):15561
    [Crossref] [Google Scholar]
  104. Ohala JJ. 2006.. Phonetics: overview. . In Encyclopedia of Language and Linguistics, ed. K Brown , pp. 46870. Oxford, UK:: Elsevier. , 2nd ed..
    [Google Scholar]
  105. Ombao H, Lindquist M, Thompson W, Aston J, eds. 2016.. Handbook of Neuroimaging Data Analysis. Boca Raton, FL:: Chapman & Hall/CRC
    [Google Scholar]
  106. Pan J, Huang C. 2014.. Random effects selection in generalized linear mixed models via shrinkage penalty function. . Stat. Comput. 24::72538
    [Crossref] [Google Scholar]
  107. Pasad A, Chou JC, Livescu K. 2021.. Layer-wise analysis of a self-supervised speech representation model. . In 2021 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU), pp. 91421. Piscataway, NJ:: IEEE
    [Google Scholar]
  108. Pigoli D, Hadjipantelis PZ, Coleman JS, Aston JAD. 2018.. The statistical analysis of acoustic phonetic data: exploring differences between spoken Romance languages. . J. R. Stat. Soc. C 67:(5):110345
    [Crossref] [Google Scholar]
  109. Pini A, Spreafico L, Vantini S, Vietti A. 2019.. Multi-aspect local inference for functional data: analysis of ultrasound tongue profiles. . J. Multivar. Anal. 170::16285
    [Crossref] [Google Scholar]
  110. Pisoni DB, Tash J. 1974.. Reaction times to comparisons within and across phonetic categories. . Percept. Psychophys. 15:(2):28590
    [Crossref] [Google Scholar]
  111. Pouplier M, Cederbaum J, Hoole P, Marin S, Greven S. 2017.. Mixed modeling for irregularly sampled and correlated functional data: speech science applications. . J. Acoust. Soc. Am. 142:(2):93546
    [Crossref] [Google Scholar]
  112. Proudfoot M, Woolrich MW, Nobre AC, Turner MR. 2014.. Magnetoencephalography. . Pract. Neurol. 14:(5):33643
    [Crossref] [Google Scholar]
  113. Puggaard-Rode R. 2022.. Analyzing time-varying spectral characteristics of speech with function-on-scalar regression. . J. Phon. 95::101191
    [Crossref] [Google Scholar]
  114. Pulleyblank EG. 1978.. Abruptness and gradualness in phonological change. . In Linguistic and Literary Studies, Vol. 3: Historical and Comparative Linguistics, ed. MA Jazayery, EC Polomé, W Winter , pp. 18192. Berlin:: De Gruyter
    [Google Scholar]
  115. Rabiner L, Juang BH. 1993.. Fundamentals of Speech Recognition. Hoboken, NJ:: Prentice-Hall
    [Google Scholar]
  116. Rahim M, Keijn W, Schroeter J, Goodyear C. 1991.. Acoustic to articulatory parameter mapping using an assembly of neural networks. . In Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing, pp. 48588. Piscataway, NJ:: IEEE
    [Google Scholar]
  117. Ramsay JO, Munhall K, Gracco V, Ostry D. 1996.. Functional data analyses of lip motion. . J. Acoust. Soc. Am. 99::371827
    [Crossref] [Google Scholar]
  118. Ramsay JO, Silverman BW. 2005.. Functional Data Analysis. New York:: Springer. , 2nd ed..
    [Google Scholar]
  119. Rasmussen CE, Williams CKI. 2005.. Gaussian Processes for Machine Learning. Cambridge, MA:: MIT Press
    [Google Scholar]
  120. Rebernik T, Jacobi J, Jonkers R, Noiray A, Wieling M. 2021.. A review of data collection practices using electromagnetic articulography. . Lab. Phonol. 12:(1):6
    [Crossref] [Google Scholar]
  121. Reddy S, Stanford JN. 2015.. Toward completely automated vowel extraction: introducing DARLA. . Linguist. Vanguard 1:(1):1528
    [Crossref] [Google Scholar]
  122. Renwick MEL, Stanley JA. 2020.. Modeling dynamic trajectories of front vowels in the American South. . J. Acoust. Soc. Am. 147:(1):57995
    [Crossref] [Google Scholar]
  123. Renwick MEL, Stanley JA, Forrest J, Glass L. 2023.. Boomer peak or Gen X cliff? From SVS to LBMS in Georgia English. . Lang. Var. Change 35:(2):17597
    [Crossref] [Google Scholar]
  124. Richmond K. 2006.. A trajectory mixture density network for the acoustic-articulatory inversion mapping. . In Proceedings of INTERSPEECH 2006, pp. 57780. N.p.:: Int. Speech Commun. Assoc.
    [Google Scholar]
  125. Rosner BS, Pickering JB. 1994.. Vowel Perception and Production. Oxford, UK:: Oxford Univ. Press
    [Google Scholar]
  126. Schafer RW, Rabiner LR. 1970.. System for automatic formant analysis of voiced speech. . J. Acoust. Soc. Am. 47:(2B):63448
    [Crossref] [Google Scholar]
  127. Schelldorfer J, Meier L, Bühlmann P. 2014.. GLMMLasso: an algorithm for high-dimensional generalized linear mixed models using ℓ1-penalization. . J. Comput. Graph. Stat. 23:(2):46077
    [Crossref] [Google Scholar]
  128. Schroeder MR. 1967.. Determination of the geometry of the human vocal tract by acoustic measurements. . J. Acoust. Soc. Am. 41:(4B):100210
    [Crossref] [Google Scholar]
  129. Schroeder MR. 1985.. Linear predictive coding of speech: review and current directions. . IEEE Commun. Mag. 23:(8):5461
    [Crossref] [Google Scholar]
  130. Schroeter J, Meyer P, Parthasarathy S. 1990.. Evaluation of improved articulatory codebooks and codebook access distance measures. . In Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing, pp. 39396. Piscataway, NJ:: IEEE
    [Google Scholar]
  131. Schroeter J, Sondhi M. 1989.. Dynamic programming search of articulatory codebooks. . In Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing, pp. 58891. Piscataway, NJ:: IEEE
    [Google Scholar]
  132. Seneviratne N, Sivaraman G, Espy-Wilson C. 2019.. Multi-corpus acoustic-to-articulatory speech inversion. . In Proceedings of INTERSPEECH 2019, pp. 85963. N.p.:: Int. Speech Commun. Assoc.
    [Google Scholar]
  133. Shiers N, Aston JA, Smith JQ, Coleman JS. 2017.. Gaussian tree constraints applied to acoustic linguistic functional data. . J. Multivar. Anal. 154::199215
    [Crossref] [Google Scholar]
  134. Shumway RH, Stoffer DS. 2017.. Time Series Analysis and Its Applications: With R Examples. Cham, Switz:.: Springer. , 4th ed..
    [Google Scholar]
  135. Siriwardena YM, Sivaraman G, Espy-Wilson C. 2022.. Acoustic-to-articulatory speech inversion with multi-task learning. . In Proceedings of INTERSPEECH 2022, pp. 502024. N.p.:: Int. Speech Commun. Assoc.
    [Google Scholar]
  136. Sonderegger M. 2023.. Regression Modeling for Linguistic Data. Cambridge, MA:: MIT Press
    [Google Scholar]
  137. Sonderegger M, Niyogi P. 2010.. Combining data and mathematical models of language change. . In Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics, pp. 101929. Stroudsburg, PA:: Assoc. Comput. Linguist.
    [Google Scholar]
  138. Sonderegger M, Sóskuthy M. 2024.. Advancements of phonetics in the 21st century: quantitative data analysis. . PsyArXiv mc6a9. https://osf.io/preprints/psyarxiv/mc6a9
  139. Sóskuthy M. 2021.. Evaluating generalised additive mixed modelling strategies for dynamic speech analysis. . J. Phon. 84::101017
    [Crossref] [Google Scholar]
  140. Stevens KN. 1998.. Acoustic Phonetics. Cambridge, MA:: MIT Press
    [Google Scholar]
  141. Stevens M, Harrington J, Schiel F. 2019.. Associating the origin and spread of sound change using agent-based modelling applied to /s/-retraction in English. . Glossa J. Gen. Linguist. 4:(1):8
    [Google Scholar]
  142. Stone M. 1990.. A three-dimensional model of tongue movement based on ultrasound and X-ray microbeam data. . J. Acoust. Soc. Am. 87:(5):220717
    [Crossref] [Google Scholar]
  143. Tagliamonte SA. 2002.. Analyzing Sociolinguistic Variation. Cambridge, UK:: Cambridge Univ. Press
    [Google Scholar]
  144. Talkin D. 1995.. A robust algorithm for pitch tracking (RAPT). . In Speech Coding and Synthesis, ed. WB Kleijn, KK Paliwal , pp. 497518. Amsterdam:: Elsevier
    [Google Scholar]
  145. Tanner J, Sonderegger M, Stuart-Smith J, Fruehwald J. 2020.. Toward ``English'' phonetics: variability in the pre-consonantal voicing effect across English dialects and speakers. . Front. Artif. Intel. 3::38
    [Crossref] [Google Scholar]
  146. Tavakoli S, Pigoli D, Aston JAD, Coleman JS. 2019.. A spatial modeling approach for linguistic object data: analyzing dialect sound variations across Great Britain. . J. Am. Stat. Assoc. 114:(527):108196
    [Crossref] [Google Scholar]
  147. Thul R, Conklin K, Barr DJ. 2021.. Using GAMMs to model trial-by-trial fluctuations in experimental data: more risks but hardly any benefit. . J. Mem. Lang. 120::104247
    [Crossref] [Google Scholar]
  148. Toda T, Black A, Tokuda K. 2004.. Acoustic-to-articulatory inversion mapping with Gaussian mixture model. . In Proceedings of INTERSPEECH 2004, pp. 112932. N.p.:: Int. Speech Commun. Assoc.
    [Google Scholar]
  149. Todd S, Pierrehumbert JB, Hay J. 2019.. Word frequency effects in sound change as a consequence of perceptual asymmetries: an exemplar-based model. . Cognition 185::120
    [Crossref] [Google Scholar]
  150. tom Dieck T, Pérez-Toro PA, Arias T, Nöth E, Klumpp P. 2022.. Wav2vec behind the scenes: how end2end models learn phonetics. . In Proceedings of INTERSPEECH 2022, pp. 513034. N.p.:: Int. Speech Commun. Assoc.
    [Google Scholar]
  151. Tomaschek F, Wieling M, Arnold D, Baayen RH. 2013.. Word frequency, vowel length and vowel quality in speech production: an EMA study of the importance of experience. . In Proceedings of INTERSPEECH 2013, pp. 13026. N.p.:: Int. Speech Commun. Assoc.
    [Google Scholar]
  152. Toutios A, Narayanan S. 2016.. Advances in real-time magnetic resonance imaging of the vocal tract for speech science and technology research. . APSIPA Trans. Signal Inform. Proc. 5::e6
    [Google Scholar]
  153. Vasishth S, Nicenboim B, Beckman ME, Li F, Kong EJ. 2018.. Bayesian data analysis in the phonetic sciences: a tutorial introduction. . J. Phon. 71::14761
    [Crossref] [Google Scholar]
  154. Viterbi A. 1967.. Error bounds for convolutional codes and an asymptotically optimum decoding algorithm. . IEEE Trans. Inf. Theory 13:(2):26069
    [Crossref] [Google Scholar]
  155. Voeten CC, Heeringa W, Van De Velde H. 2022.. Normalization of nonlinearly time-dynamic vowels. . J. Acoust. Soc. Am. 152:(5):2692710
    [Crossref] [Google Scholar]
  156. Volkmann A, Stöcker A, Scheipl F, Greven S. 2023.. Multivariate functional additive mixed models. . Stat. Model. 23:(4):30326
    [Crossref] [Google Scholar]
  157. Westbury J, Turner G, Denbowski J. 1994.. X-ray microbeam speech production database user's handbook, v. 1.0. Tech. Rep. , Waisman Cent. Ment. Retard. Hum. Dev., Univ. Wis., Madison, WI:. https://ubeam.engr.wisc.edu/pdf/ubdbman.pdf
    [Google Scholar]
  158. Wieling M. 2018.. Analyzing dynamic phonetic data using generalized additive mixed modeling: a tutorial focusing on articulatory differences between L1 and L2 speakers of English. . J. Phon. 70::86116
    [Crossref] [Google Scholar]
  159. Wieling M, Tomaschek F, Arnold D, Tiede M, Bröker F, et al. 2016.. Investigating dialectal differences using articulography. . J. Phon. 59::12243
    [Crossref] [Google Scholar]
  160. Winter B. 2019.. Statistics for Linguists: An Introduction Using R. London: Routledge:
    [Google Scholar]
  161. Wood SN. 2011.. Fast stable restricted maximum likelihood and marginal likelihood estimation of semiparametric generalized linear models. . J. R. Stat. Soc. B 73:(1):336
    [Crossref] [Google Scholar]
  162. Wood SN. 2017.. Generalized Additive Models: An Introduction with R. Boca Raton, FL:: Chapman & Hall/CRC. , 2nd ed..
    [Google Scholar]
  163. Yu AC. 2023.. The actuation problem. . Annu. Rev. Linguist. 9::21531
    [Crossref] [Google Scholar]
  164. Yuan J, Liberman M. 2008.. Speaker identification on the SCOTUS corpus. . J. Acoust. Soc. Am. 123:(Suppl. 5):3878 ( Abstr. )
    [Crossref] [Google Scholar]
  165. Zen H, Tokuda K, Black AW. 2009.. Statistical parametric speech synthesis. . Speech Commun. 51:(11):103964
    [Crossref] [Google Scholar]
/content/journals/10.1146/annurev-statistics-112723-034642
Loading
/content/journals/10.1146/annurev-statistics-112723-034642
Loading

Data & Media loading...

Supplemental Materials

  • Article Type: Review Article
This is a required field
Please enter a valid email address
Approval was a Success
Invalid data
An Error Occurred
Approval was partially successful, following selected items could not be processed due to error