1932

Abstract

Semiautomatic analysis of digital speech collections is transforming the science of phonetics. Convenient search and analysis of large published bodies of recordings, transcripts, metadata, and annotations—up to three or four orders of magnitude larger than a few decades ago—have created a trend towards “corpus phonetics,” whose benefits include greatly increased researcher productivity, better coverage of variation in speech patterns, and crucial support for reproducibility. The results of this work include insights into theoretical questions at all levels of linguistic analysis, along with applications in fields as diverse as psychology, medicine, and poetics, as well as within phonetics itself. Remaining challenges include still-limited access to the necessary skills and a lack of consistent standards. These changes coincide with the broader Open Data movement, but future solutions will also need to include more constrained forms of publication motivated by valid concerns for privacy, confidentiality, and intellectual property.

Loading

Article metrics loading...

/content/journals/10.1146/annurev-linguistics-011516-033830
2019-01-14
2024-06-21
Loading full text...

Full text loading...

/deliver/fulltext/linguistics/5/1/annurev-linguistics-011516-033830.html?itemId=/content/journals/10.1146/annurev-linguistics-011516-033830&mimeType=html&fmt=ahah

Literature Cited

  1. Allen WS 1956. Structure and system in the Abaza verbal complex. Trans. Philol. Soc. 55:127–76
    [Google Scholar]
  2. Anderson AH, Bader M, Bard EG, Boyle EA, Doherty G et al. 1991. The HCRC map task corpus. Lang. Speech 34:351–66
    [Google Scholar]
  3. Bard EG, Aylett MP 1999. The dissociation of deaccenting, givenness, and syntactic role in spontaneous speech. Proceedings of the 14th International Congress of Phonetic Sciences (ICPhS-14)1753–56 London: Int. Phon. Assoc.
    [Google Scholar]
  4. Baum LE, Petrie T, Soules G, Weiss N 1970. A maximization technique occurring in the statistical analysis of probabilistic functions of Markov chains. Ann. Math. Stat. 41:164–71
    [Google Scholar]
  5. Bernstein C 2003. PennSound Manifesto Univ. Pa. Philadelphia: http://writing.upenn.edu/pennsound/manifesto.php
    [Google Scholar]
  6. Bigi B 2015. SPPAS-multi-lingual approaches to the automatic annotation of speech. Phonetician 111/112:54–69
    [Google Scholar]
  7. Bird S, Hanke FR, Adams O, Lee H 2014. Aikuma: a mobile app for collaborative language documentation. Proceedings of the 2014 Workshop on the Use of Computational Methods in the Study of Endangered Languages1–5 Stroudsburg, PA: Assoc. Comput. Linguist.
    [Google Scholar]
  8. Blachon D, Gauthier E, Besacier L, Kouarata GN, Adda-Decker M, Rialland A 2016. Parallel speech collection for under-resourced language studies using the Lig-Aikuma mobile device app. Proceedings of the 17th Annual Conference of the International Speech Communication Association (INTERSPEECH2016)61–66 Red Hook, NY: Curran
    [Google Scholar]
  9. Boyle EA, Anderson AH, Newlands A 1994. The effects of visibility on dialogue and performance in a cooperative problem solving task. Lang. Speech 37:1–20
    [Google Scholar]
  10. Byrd D 1992. Preliminary results on speaker‐dependent variation in the TIMIT database. J. Acoust. Soc. Am. 92:593–96
    [Google Scholar]
  11. Byrd D 1993. 54,000 American stops. UCLA Work. Pap. Phon. 83:97–116
    [Google Scholar]
  12. Byrd D 1994. Relations of sex and dialect to reduction. Speech Commun 15:39–54
    [Google Scholar]
  13. Chiba T, Kajiyama M 1941. The Vowel: Its Nature and Structure Tokyo: Kaiseikan
    [Google Scholar]
  14. Chodroff E, Wilson C 2017. Structure in talker-specific phonetic realization: covariation of stop consonant VOT in American English. J. Phon. 61:30–47
    [Google Scholar]
  15. Cieri C, Liberman M, Strassel S, DiPersio D, Wright J et al. 2018. From ‘solved problems’ to new challenges: a report on LDC activities. Proceedings of the 11th Conference in International Language Resources and Evaluation (LREC18)3265–69 Paris: Eur. Lang. Resour. Assoc.
    [Google Scholar]
  16. Cordereix P 2014. Ferdinand Brunot et Les Archives de la parole: le phonographe, la mort, la mémoire. Revue BNF 48:5–11
    [Google Scholar]
  17. Cresti E, do Nascimento FB, Moreno-Sandoval A, Veronis J, Martin P, Choukri K 2004. The C-ORAL-ROM CORPUS: a multilingual resource of spontaneous speech for romance languages. Proceedings of the 4th International Conference on Language Resources and Evaluation (LREC04)575–78 Paris: Eur. Lang. Resour. Assoc.
    [Google Scholar]
  18. Ellis AJ 1874. On the physical constituents of accent and emphasis. Trans. Philol. Soc. 15:113–64
    [Google Scholar]
  19. Evanini K, Isard S, Liberman M 2009. Automatic formant extraction for sociolinguistic analysis of large corpora. Proceedings of the 10th Annual Conference of the International Speech Communication Association (INTERSPEECH2009)1655–58 Baixas, Fr.: Int. Speech Commun. Assoc.
    [Google Scholar]
  20. Fant G 1956. On the predictability of formant levels and spectrum envelopes from formant frequencies. For Roman Jakobson: Essays on the Occasion of His Sixtieth Birthday M Halle, HG Lunt, CH Van Schooneveld 109–20 The Hague: Mouton
    [Google Scholar]
  21. Fox MA 2000. Syllable-final /s/ lenition in the LDC's CallHome Spanish Corpus. Proceedings of the 6th International Conference on Spoken Language Processing (ICSLP2000)556–59 Beijing: China Mil. Friendsh. Publ.
    [Google Scholar]
  22. Fox MA 2006. Usage-based effects in Latin American Spanish syllable-final /s/ lenition PhD thesis, Univ. Pa. Philadelphia:
    [Google Scholar]
  23. Fromont R, Hay J 2012. LaBB-CAT: an annotation store. Proceedings of the Australasian Language Technology Association Workshop 2012113–17 Stroudsburg, PA: Assoc. Comput. Linguist.
    [Google Scholar]
  24. Garofolo J, Lamel L, Fisher W, Fiscus J, Pallett D, Dahlgren N 1993. DARPA TIMIT Acoustic-Phonetic Continuous Speech Corpus CD-ROM, Natl. Inst. Stand. Technol. rep. 4930 Washington, DC:
    [Google Scholar]
  25. Godfrey JJ, Holliman EC, McDaniel J 1992. SWITCHBOARD: telephone speech corpus for research and development. Proceedings of the 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP)517–20 Piscataway, NJ: IEEE
    [Google Scholar]
  26. Gorman K, Howell J, Wagner M 2011. Prosodylab-aligner: a tool for forced alignment of laboratory speech. Can. Acoust. 39:192–93
    [Google Scholar]
  27. Greenberg S, Hollenback J, Ellis D 1996. Insights into spoken language gleaned from phonetic transcription of the Switchboard corpus. Proceedings of the 4th International Conference on Spoken Language Processing (ICSLP96)24–27 Newark: Speech Res. Lab., Univ. Del.
    [Google Scholar]
  28. Hillenbrand J, Getty LA, Clark MJ, Wheeler K 1995. Acoustic characteristics of American English vowels. J. Acoust. Soc. Am. 97:3099–111
    [Google Scholar]
  29. Hinrichs E, Krauwer S 2014. The CLARIN research infrastructure: resources and tools for eHumanities scholars. Proceedings of the 9th International Conference on Language Resources and Evaluation (LREC14)1525–31 Paris: Eur. Lang. Resour. Assoc.
    [Google Scholar]
  30. Jones D 1909. Intonation Curves: A Collection of Phonetic Texts, in Which Intonation Is Marked Throughout by Means of Curved Lines on a Musical Stave Berlin: Teubner
    [Google Scholar]
  31. Keating PA, Byrd D, Flemming E, Todaka Y 1994. Phonetic analyses of word and segment variation using the TIMIT corpus of American English. Speech Commun 14:131–42
    [Google Scholar]
  32. Kendall T 2007. Enhancing sociolinguistic data collections: the North Carolina Sociolinguistic Archive and Analysis Project. Univ. Pa. Work. Pap. Linguist. 13:15–26
    [Google Scholar]
  33. Kendall T 2013. Speech Rate, Pause and Sociolinguistic Variation: Studies in Corpus Sociophonetics Berlin: Springer
    [Google Scholar]
  34. Kitchin R 2014. The Data Revolution: Big Data, Open Data, Data Infrastructures and Their Consequences Thousand Oaks, CA: Sage
    [Google Scholar]
  35. Kirkham S, Moore E 2016. Constructing social meaning in political discourse: phonetic variation and verb processes in Ed Miliband's speeches. Lang. Soc. 45:87–111
    [Google Scholar]
  36. Kluyver T, Ragan-Kelley B, Pérez F, Granger BE, Bussonnier M et al. 2016. Jupyter Notebooks—a publishing format for reproducible computational workflows. Proceedings of the 20th International Conference on Electronic Publishing87–90 Amsterdam: IOS
    [Google Scholar]
  37. Koenig W, Dunn HK, Lacy LY 1946. The sound spectrograph. J. Acoust. Soc. Am. 18:19–49
    [Google Scholar]
  38. Kretzschmar WA 1993. Handbook of the Linguistic Atlas of the Middle and South Atlantic States Chicago: Univ. Chicago Press
    [Google Scholar]
  39. Kretzschmar WA 2003. Linguistic atlases of the United States and Canada. Am. Speech 88:25–48
    [Google Scholar]
  40. Labov W 1963. The social motivation of a sound change. Word 19:273–309
    [Google Scholar]
  41. Labov W, Rosenfelder I, Fruehwald J 2013. One hundred years of sound change in Philadelphia: linear incrementation, reversal, and reanalysis. Language 89:30–65
    [Google Scholar]
  42. Lee A, Kawahara T, Shikano K 2001. Julius—an open source real-time large vocabulary recognition engine. Proceedings of the 2nd International Conference on Speech Communication and Technology (INTERSPEECH2001)1691–94 Baixas, Fr.: Int. Speech Commun. Assoc.
    [Google Scholar]
  43. Leslie J, Snyder R 2010. History of the early days of Ampex Corporation Paper, Audio Eng. Soc. (AES) Hist. Comm., AES New York:
    [Google Scholar]
  44. Liberman M 2007. Nationality, gender, and pitch. Language Log Blog Nov. 12. http://itre.cis.upenn.edu/∼myl/languagelog/archives/005104.html
    [Google Scholar]
  45. MacArthur MJ 2016. Monotony, the churches of poetry reading, and sound studies. PMLA 131:38–63
    [Google Scholar]
  46. MacWhinney B 1996. The CHILDES system. Am. J. Speech Lang. Pathol. 5:5–14
    [Google Scholar]
  47. MacWhinney B 2001. From CHILDES to TalkBank. Research on Child Language Acquisition M Almgren, A Barreña, M Ezeizaberrena, I Idiazabal, B MacWhinney 17–34 Somerville, MA: Cascadilla
    [Google Scholar]
  48. Manuel SY, Shattuck-Hufnagel S, Huffman MK, Stevens KN, Carlson R, Hunnicutt S 1992. Studies of vowel and consonant reduction. Proceedings of the 2nd International Conference on Spoken Language Processing (ICSLP92)943–46 Edmonton: Univ. Alberta
    [Google Scholar]
  49. Martin P 2004. Winpitch corpus, a text to speech alignment tool for multimodal corpora. Proceedings of the 4th International Conference on Language Resources and Evaluation (LREC04)537–40 Paris: Eur. Lang. Resour. Assoc.
    [Google Scholar]
  50. Meyer AS, Huettig F, Levelt WJ 2016. Same, different, or closely related: What is the relationship between language production and comprehension. J. Mem. Lang. 89:1–7
    [Google Scholar]
  51. Panayotov V, Chen G, Povey D, Khudanpur S 2015. Librispeech: an ASR corpus based on public domain audio books. Proceedings of the 2015 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP)5206–10 Piscataway, NJ: IEEE
    [Google Scholar]
  52. Peterson GE, Barney HL 1952. Control methods used in a study of the vowels. J. Acoust. Soc. Am. 24:175–84
    [Google Scholar]
  53. Povey D, Ghoshal A, Boulianne G, Burget L, Glembek O et al. 2011. The Kaldi speech recognition toolkit. Proceedings of the 2011 IEEE Workshop on Automatic Speech Recognition and Understanding158–63 Piscataway, NJ: IEEE
    [Google Scholar]
  54. Rabiner LR 1989. A tutorial on hidden Markov models and selected applications in speech recognition. Proc. IEEE 77:257–86
    [Google Scholar]
  55. Ryant N, Liberman M 2016.a Automatic analysis of phonetic speech style dimensions. Proceedings of the 17th Annual Conference of the International Speech Communication Association (INTERSPEECH2016)77–81 Red Hook, NY: Curran
    [Google Scholar]
  56. Ryant N, Liberman M 2016.b Large-scale analysis of Spanish /s/-lenition using audiobooks. Proceedings of the 22nd International Congress on Acoustics pap. ICA2016-721 Buenos Aires: MCI
    [Google Scholar]
  57. Schmidt T 2004. Transcribing and annotating spoken language with EXMARaLDA. Proceedings of the LREC Workshop on XML-Based Richly Annotated Corpora69–74 Paris: Eur. Lang. Resour. Assoc.
    [Google Scholar]
  58. Sonderegger M, Keshet J 2010. Automatic discriminative measurement of voice onset time. Proceedings of the 11th Annual Conference of the International Speech Communication Association (INTERSPEECH2010)2242–45 Baixas, Fr.: Int. Speech Commun. Assoc.
    [Google Scholar]
  59. Sproat R, Fujimura O 1993. Allophonic variation in English /l/ and its implications for phonetic implementation. J. Phon. 21:291–311
    [Google Scholar]
  60. Stolcke A, Ryant N, Mitra V, Yuan J, Wang W, Liberman M 2014. Highly accurate phonetic segmentation using boundary correction models and system fusion. Proceedings of the 2014 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP)5552–56 Piscataway, NJ: IEEE
    [Google Scholar]
  61. Umeda N 1975. Vowel duration in American English. J. Acoust. Soc. Am. 58:434–45
    [Google Scholar]
  62. Umeda N 1977. Consonant duration in American English. J. Acoust. Soc. Am. 61:846–58
    [Google Scholar]
  63. Viterbi A 1967. Error bounds for convolutional codes and an asymptotically optimum decoding algorithm. IEEE Trans. Inf. Theory 13:260–69
    [Google Scholar]
  64. Watrous RL 1991. Current status of Peterson–Barney vowel formant data. J. Acoust. Soc. Am. 89:2459–60
    [Google Scholar]
  65. Winkelmann R, Harrington J, Jänsch K 2017. EMU-SDMS: advanced speech database management and analysis in R. Comput. Speech Lang. 45:392–410
    [Google Scholar]
  66. Xie Y 2015. Dynamic Documents with R and knitr Boca Raton, FL: CRC
    [Google Scholar]
  67. Yuan J, Liberman M 2009. Investigating /l/ variation in English through forced alignment. Proceedings of the 10th Annual Conference of the International Speech Communication Association (INTERSPEECH2009)2215–18 Baixas, Fr.: Int. Speech Commun. Assoc.
    [Google Scholar]
  68. Yuan J, Liberman M 2011.a /l/ variation in American English: a corpus approach. J. Speech Sci. 1:35–46
    [Google Scholar]
  69. Yuan J, Liberman M 2011.b Automatic detection of “g-dropping” in American English using forced alignment. Proceedings of the 2011 IEEE Workshop on Automatic Speech Recognition and Understanding490–93 Piscataway, NJ: IEEE
    [Google Scholar]
  70. Yuan J, Liberman M 2011.c Automatic measurement and comparison of vowel nasalization across languages. Proceedings of the 17th International Congress of Phonetic Sciences (ICPhS XVII)2244–47 London: Int. Phon. Assoc.
    [Google Scholar]
  71. Yuan J, Liberman M, Cieri C 2007. Towards an integrated understanding of speech overlaps in conversation. Proceedings of the 16th International Congress of Phonetic Sciences (ICPhS XVI)1337–40 London: Int. Phon. Assoc.
    [Google Scholar]
/content/journals/10.1146/annurev-linguistics-011516-033830
Loading
  • Article Type: Review Article
This is a required field
Please enter a valid email address
Approval was a Success
Invalid data
An Error Occurred
Approval was partially successful, following selected items could not be processed due to error