1932

Abstract

Speech research during recent years has moved progressively away from its traditional focus on audition toward a more multisensory approach. In addition to audition and vision, many somatosenses including proprioception, pressure, vibration, and aerotactile sensation are all highly relevant modalities for experiencing and/or conveying speech. In this article, we review both long-standing cross-modal effects stemming from decades of audiovisual speech research and new findings related to somatosensory effects. Cross-modal effects in speech perception to date have been found to be constrained by temporal congruence and signal relevance, but appear to be unconstrained by spatial congruence. The literature reveals that, far from taking place in a one-, two-, or even three-dimensional space, speech occupies a highly multidimensional sensory space. We argue that future research in cross-modal effects should expand to consider each of these modalities both separately and in combination with other modalities in speech.

Loading

Article metrics loading...

/content/journals/10.1146/annurev-linguistics-011718-012353
2019-01-14
2024-05-09
Loading full text...

Full text loading...

/deliver/fulltext/linguistics/5/1/annurev-linguistics-011718-012353.html?itemId=/content/journals/10.1146/annurev-linguistics-011718-012353&mimeType=html&fmt=ahah

Literature Cited

  1. Abbs JH, Gracco VL 1984. Control of complex motor gestures: orofacial muscle responses to load perturbations of lip during speech. J. Neurophysiol. 51:705–23
    [Google Scholar]
  2. Alcorn S 1932. The Tadoma method. Volta Rev 34:195–98
    [Google Scholar]
  3. Alsius A, Navarra J, Campbell R, Soto-Faraco S 2005. Audiovisual integration of speech falters under high attention demands. Curr. Biol. 15:839–43
    [Google Scholar]
  4. Alsius A, Navarra J, Soto-Faraco S 2007. Attention to touch weakens audiovisual speech integration. Exp. Brain Res. 183:399–404
    [Google Scholar]
  5. Anderson DM, Fairgrieve E 1996. Assessment of sensorimotor impairments. Assessment in Neuropsychology JR Beech, L Harding 82–96 London: Routledge
    [Google Scholar]
  6. Arnold P, Hill F 2001. Bisensory augmentation: a speechreading advantage when speech is clearly audible and intact. Br. J. Psychol. 92:339–55
    [Google Scholar]
  7. Baart M 2016. Quantifying lip-read-induced suppression and facilitation of the auditory N1 and P2 reveals peak enhancements and delays: audiovisual speech integration at the N1 and P2. Psychophysiology 53:1295–306
    [Google Scholar]
  8. Basu Mallick D, Magnotti JF, Beauchamp MS 2015. Variability and stability in the McGurk effect: contributions of participants, stimuli, time, and response type. Psychon. Bull. Rev. 22:1299–307
    [Google Scholar]
  9. Bell AM 1867. Visible Speech: The Science of Universal Alphabets; or, Self-Interpreting Physiological Letters, for the Writing of All Languages in One Alphabet London: Simpkin, Marshall & Co.
  10. Bernstein LE, Demorest ME, Coulter DC, O'Connell MP 1991. Lipreading sentences with vibrotactile vocoders: performance of normal-hearing and hearing-impaired subjects. J. Acoust. Soc. Am. 90:2971–84
    [Google Scholar]
  11. Bertelson P, Aschersleben G 1998. Automatic visual bias of perceived auditory location. Psychon. Bull. Rev. 5:482–89
    [Google Scholar]
  12. Bertelson P, Vroomen J, Wiegeraad G, de Gelder B 1994. Exploring the relation between McGurk interference and ventriloquism. Proceedings of the 3rd International Conference on Spoken Language Processing559–62 Baixas, Fr.: Int. Speech Commun. Assoc.
    [Google Scholar]
  13. Bicevskis K, Derrick D, Gick B 2016. Visual–tactile integration in speech perception: evidence for modality neutral speech primitives. J. Acoust. Soc. Am. 140:3531–39
    [Google Scholar]
  14. Brancazio L, Miller JL, Mondini M 2002. Audiovisual integration in the absence of a McGurk effect. J. Acoust. Soc. Am. 111:2433
    [Google Scholar]
  15. Bruderer AG, Danielson DK, Kandhadai P, Werker JF 2015. Sensorimotor influences on speech perception in infancy. PNAS 112:13531–36
    [Google Scholar]
  16. Calvert GA, Bullmore ET, Brammer MJ, Campbell R, Williams SC et al. 1997. Activation of auditory cortex during silent lipreading. Science 276:593–96
    [Google Scholar]
  17. Campbell R, MacSweeney M, Surguladze S, Calvert G, McGuire P et al. 2001. Cortical substrates for the perception of face actions: an fMRI study of the specificity of activation for seen speech and for meaningless lower-face acts (gurning). Cogn. Brain Res. 12:233–43
    [Google Scholar]
  18. Chang C, Keough M, Schellenberg MH, Gick B 2018. Visual-aerotactile perception and hearing loss. Can. Acoust. 46: In press
    [Google Scholar]
  19. de Gelder B, Bertelson P, Vroomen J 1996. Aspects of modality in audio-visual processes. Speechreading by Humans and Machines: Models, Systems, and Applications DG Stork, ME Hennecke 179–91 NATO ASI ser. F, Comput. Syst. Sci. 150 Berlin: Springer
    [Google Scholar]
  20. Derrick D, Anderson P, Gick B, Green S 2009. Characteristics of air puffs produced in English “pa”: experiments and simulations. J. Acoust. Soc. Am. 125:2272–81
    [Google Scholar]
  21. Derrick D, De Rybel T 2015. System for audio analysis and perception enhancement WO patent 2015/122785 A1
  22. Derrick D, De Rybel T, Fiasson R 2015. Recording and reproducing speech airflow outside the mouth. Can. Acoust. 43:3
    [Google Scholar]
  23. Derrick D, Gick B 2013. Aerotactile integration from distal skin stimuli. Multisens. Res. 26:405–16
    [Google Scholar]
  24. Derrick D, O'Beirne GA, De Rybel T, Hay J 2014. Aero-tactile integration in fricatives: converting audio to air flow information for speech perception enhancement. Proceedings of the 15th Annual Conference of the International Speech Communication Association2580–84 Baixas, Fr: Int. Speech Commun. Assoc.
    [Google Scholar]
  25. Erickson LC, Zielinski BA, Zielinski JEV, Liu G, Turkeltaub PE et al. 2014. Distinct cortical locations for integration of audiovisual speech and the McGurk effect. Front. Psychol. 5:534
    [Google Scholar]
  26. Fisher BD, Pylyshyn ZW 1994. The cognitive architecture of bimodal event perception: a commentary and addendum to Radeau. Curr. Psychol. Cogn. 13:92–96
    [Google Scholar]
  27. Fowler CA 2010. Speech production. The Corsini Encyclopedia of Psychology IB Weiner, WE Craighead Hoboken, NJ: Wiley Online
    [Google Scholar]
  28. Fowler CA 1996. Listeners do hear sounds, not tongues. J. Acoust. Soc. Am. 99:1730–41
    [Google Scholar]
  29. Fowler CA 1986. An event approach to the study of speech perception from a direct-realist perspective. Status Rep. Speech Res. 14:139
    [Google Scholar]
  30. Fowler CA 1995. Speech production. Speech, Language, and Communication JL Miller, PD Eimas 29–61 New York: Elsevier
    [Google Scholar]
  31. Fowler CA, Dekle DJ 1991. Listening with eye and hand: cross-modal contributions to speech perception. J. Exp. Psychol. Hum. Percept. Perform. 17:816–28
    [Google Scholar]
  32. Frayne E, Coulson S, Adams R, Croxson G, Waddington G 2016. Proprioceptive ability at the lips and jaw measured using the same psychophysical discrimination task. Exp. Brain Res. 234:1679–87
    [Google Scholar]
  33. Ghosh SS, Matthies ML, Maas E, Hanson A, Tiede M et al. 2010. An investigation of the relation between sibilant production and somatosensory and auditory acuity. J. Acoust. Soc. Am. 128:3079–87
    [Google Scholar]
  34. Gibson JJ 1966. The Senses Considered as Perceptual Systems Westport, CT: Greenwood
  35. Gibson JJ 1979. The Ecological Approach to Visual Perception Boston: Houghton Mifflin
  36. Gick B, Bliss H, Michelson K, Radanov B 2012. Articulation without acoustics: “soundless” vowels in Oneida and Blackfoot. J. Phon. 40:46–53
    [Google Scholar]
  37. Gick B, Derrick D 2009. Aero-tactile integration in speech perception. Nature 462:502
    [Google Scholar]
  38. Gick B, Ikegami Y, Derrick D 2010. The temporal window of audio–tactile integration in speech perception. J. Acoust. Soc. Am. 128:EL342–46
    [Google Scholar]
  39. Gick B, Jóhannsdóttir KM, Gibraiel D, Mühlbauer J 2008. Tactile enhancement of auditory and visual speech perception in untrained perceivers. J. Acoust. Soc. Am. 123:EL72–76
    [Google Scholar]
  40. Goldenberg D, Tiede MK, Whalen D 2015. Aero-tactile influence on speech perception of voicing continua. Proceedings of the 18th International Congress of Phonetic Sciences Glasgow: Int. Phon. Assoc. 5 pp.
    [Google Scholar]
  41. Green KP, Gerdeman A 1995. Cross-modal discrepancies in coarticulation and the integration of speech information: the McGurk effect with mismatched vowels. J. Exp. Psychol. Hum. Percept. Perform. 21:1409–26
    [Google Scholar]
  42. Green KP, Kuhl PK, Meltzoff AN 1988. Factors affecting the integration of auditory and visual information in speech: the effect of vowel environment. J. Acoust. Soc. Am. 84:S155
    [Google Scholar]
  43. Green KP, Kuhl PK, Meltzoff AN, Stevens EB 1991. Integrating speech information across talkers, gender, and sensory modality: female faces and male voices in the McGurk effect. Percept. Psychophys. 50:524–36
    [Google Scholar]
  44. Holmes NP, Spence C 2005. Multisensory integration: space, time and superadditivity. Curr. Biol. 15:R762–64
    [Google Scholar]
  45. Hsiao S, Gomez-Ramirez M 2011. Touch. Neurobiology of Sensation and Reward JA Gottfried 141–60 Boca Raton, FL: CRC
    [Google Scholar]
  46. Ito T, Ostry DJ 2012. Speech sounds alter facial skin sensation. J. Neurophysiol. 107:442–47
    [Google Scholar]
  47. Ito T, Tiede M, Ostry DJ 2009. Somatosensory function in speech perception. PNAS 106:1245–48
    [Google Scholar]
  48. Jones JA, Munhall KG 1997. Effects of separating auditory and visual sources on audiovisual integration of speech. Can. Acoust. 25:13–19
    [Google Scholar]
  49. Kelso JS, Tuller B, Vatikiotis-Bateson E, Fowler CA 1984. Functionally specific articulatory cooperation following jaw perturbations during speech: evidence for coordinative structures. J. Exp. Psychol. Hum. Percept. Perform. 10:812
    [Google Scholar]
  50. Keough M, Schellenberg M, Gick B 2016. Spatial congruence in multimodal speech perception. J. Acoust. Soc. Am. 140:3225–25
    [Google Scholar]
  51. Keough M, Taylor RC, Derrick D, Schellenberg M, Gick B 2017. Sensory integration from an impossible source: perceiving simulated faces. Can. Acoust. 45:176–77
    [Google Scholar]
  52. Klucharev V, Möttönen R, Sams M 2003. Electrophysiological indicators of phonetic and non-phonetic multisensory interactions during audiovisual speech perception. Cogn. Brain Res. 18:65–75
    [Google Scholar]
  53. Macaluso E, Driver J 2005. Multisensory spatial interactions: a window onto functional integration in the human brain. Trends Neurosci 28:264–71
    [Google Scholar]
  54. MacSweeney M, Amaro E, Calvert GA, Campbell R, David AS et al. 2000. Silent speechreading in the absence of scanner noise: an event-related fMRI study. NeuroReport 11:1729–33
    [Google Scholar]
  55. Massaro DW 1987. Speech Perception by Ear and Eye: A Paradigm for Psychological Inquiry Hillsdale, NJ: Erlbaum
  56. Massaro DW 1998. Perceiving Talking Faces: From Speech Perception to a Behavioral Principle Cambridge, MA: MIT Press
  57. Massaro DW, Chen TH 2008. The motor theory of speech perception revisited. Psychon. Bull. Rev. 15:453–57
    [Google Scholar]
  58. McGrath M, Summerfield Q 1985. Intermodal timing relations and audio‐visual speech recognition by normal‐hearing adults. J. Acoust. Soc. Am. 77:678–85
    [Google Scholar]
  59. McGurk H, MacDonald J 1976. Hearing lips and seeing voices. Nature 264:746–48
    [Google Scholar]
  60. Ménard L, Turgeon C, Trudeau-Fisette P, Bellavance-Courtemanche M 2016. Effects of blindness on production–perception relationships: compensation strategies for a lip-tube perturbation of the French [u]. Clin. Linguist. Phon. 30:227–48
    [Google Scholar]
  61. Munhall KG, Gribble P, Sacco L, Ward M 1996. Temporal constraints on the McGurk effect. Percept. Psychophys. 58:351–62
    [Google Scholar]
  62. Munhall KG, Jones JA, Callan DE, Kuratate T, Vatikiotis-Bateson E 2004. Visual prosody and speech intelligibility: Head movement improves auditory speech perception. Psychol. Sci. 15:133–37
    [Google Scholar]
  63. Musacchia G, Sams M, Nicol T, Kraus N 2006. Seeing speech affects acoustic information processing in the human brainstem. Exp. Brain Res. 168:1–10
    [Google Scholar]
  64. Nasir SM, Ostry DJ 2006. Somatosensory precision in speech production. Curr. Biol. 16:1918–23
    [Google Scholar]
  65. Parker AJ, Krug K 2003. Neuronal mechanisms for the perception of ambiguous stimuli. Curr. Opin. Neurobiol. 13:433–39
    [Google Scholar]
  66. Perkell JS 2012. Movement goals and feedback and feedforward control mechanisms in speech production. J. Neurolinguistics 25:382–407
    [Google Scholar]
  67. Pickering MJ, Garrod S 2013. An integrated theory of language production and comprehension. Behav. Brain Sci. 36:329–47
    [Google Scholar]
  68. Reed CM 1996. The implications of the Tadoma method of speechreading for spoken language processing. Proceedings of the 4th International Conference on Spoken Language Processing1489–92 Piscataway, NJ: IEEE
    [Google Scholar]
  69. Reed CM, Durlach NI, Braida LD, Schultz MC 1989. Analytic study of the Tadoma method: effects of hand position on segmental speech perception. J. Speech Lang. Hear. Res. 32:921–29
    [Google Scholar]
  70. Reed CM, Rubin SI, Braida LD, Durlach NI 1978. Analytic study of the Tadoma method: discrimination ability of untrained observers. J. Speech Lang. Hear. Res. 21:625–37
    [Google Scholar]
  71. Reisberg D, McLean J, Goldfield A 1987. Easy to hear but hard to understand: a lip-reading advantage with intact auditory stimuli. Hearing by Eye: The Psychology of Lipreading B Dodd, R Campbell 97–113 Hillsdale, NJ: Erlbaum
    [Google Scholar]
  72. Rosenblum LD, Saldaña HM 1996. An audiovisual test of kinematic primitives for visual speech perception. J. Exp. Psychol. Hum. Percept. Perform. 22:318
    [Google Scholar]
  73. Rosenblum LD, Schmuckler MA, Johnson JA 1997. The McGurk effect in infants. Percept. Psychophys. 59:347–57
    [Google Scholar]
  74. Sams M, Aulanko R, Hämäläinen M, Hari R, Lounasmaa OV et al. 1991. Seeing speech: visual information from lip movements modifies activity in the human auditory cortex. Neurosci. Lett. 127:141–45
    [Google Scholar]
  75. Schwartz J-L 2010. A reanalysis of McGurk data suggests that audiovisual fusion in speech perception is subject-dependent. J. Acoust. Soc. Am. 127:1584–94
    [Google Scholar]
  76. Scott M 2012. Speech imagery as corollary discharge PhD thesis Univ. B. C. Vancouver, Can.:
  77. Shen G, Meltzoff AN, Marshall PJ 2018. Touching lips and hearing fingers: effector-specific congruency between tactile and auditory stimulation modulates N1 amplitude and alpha desynchronization. Exp. Brain Res. 236:13–29
    [Google Scholar]
  78. Soto-Faraco S, Kingstone A, Spence C 2003. Multisensory contributions to the perception of motion. Neuropsychologia 41:1847–62
    [Google Scholar]
  79. Sparks DW, Kuhl PK, Edmonds AE, Gray GP 1978. Investigating the MESA (Multipoint Electrotactile Speech Aid): the transmission of segmental features of speech. J. Acoust. Soc. Am. 63:246–57
    [Google Scholar]
  80. Sperdin HF, Cappe C, Murray MM 2010. Auditory-somatosensory multisensory interactions in humans: dissociating detection and spatial discrimination. Neuropsychologia 48:3696–705
    [Google Scholar]
  81. Sumby WH, Pollack I 1954. Visual contribution to speech intelligibility in noise. J. Acoust. Soc. Am. 26:212–15
    [Google Scholar]
  82. Theys C, McAuliffe M 2014.a Neurophysiological correlates associated with the perception of dysarthric speech Poster presented at Health Research Society of Canterbury Poster Expo Christchurch, N. Z.: Sep. 23
  83. Theys C, McAuliffe M 2014.b Auditory processing of dysarthric speech: an EEG study. Proceedings of the 32nd Australasian Winter Conference on Brain Research37 Dunedin, N. Z.: Otago Univ.
    [Google Scholar]
  84. Tian X, Poeppel D 2010. Mental imagery of speech and movement implicates the dynamics of internal forward models. Front. Psychol. 1:166
    [Google Scholar]
  85. Tiippana K 2014. What is the McGurk effect. Front. Psychol. 5:725
    [Google Scholar]
  86. Tiippana K, Andersen TS, Sams M 2004. Visual attention modulates audiovisual speech perception. Eur. J. Cogn. Psychol. 16:457–72
    [Google Scholar]
  87. Tremblay S, Shiller DM, Ostry DJ 2003. Somatosensory basis of speech production. Nature 423:866
    [Google Scholar]
  88. van Wassenhove V 2013. Speech through ears and eyes: interfacing the senses with the supramodal brain. Front. Psychol. 4:388
    [Google Scholar]
  89. van Wassenhove V, Grant KW, Poeppel D 2007. Temporal window of integration in auditory–visual speech perception. Neuropsychologia 45:598–607
    [Google Scholar]
  90. Vatikiotis-Bateson E, Kuratate T, Munhall K, Yehia H 2000. The production and perception of a realistic talking face. Proceedings of LP’98: Item Order in Language and Speech 2439–60 Prague: Karolinium
    [Google Scholar]
  91. Vatikiotis-Bateson E, Munhall KG, Hirayama M, Lee Y, Terzopoulos D 1996.a The dynamics of audiovisual behavior in speech. Speechreading by Humans and Machines: Models, Systems, and Applications221–32 NATO ASI ser. F, Comput. Syst. Sci., vol 150 Berlin: Springer
    [Google Scholar]
  92. Vatikiotis-Bateson E, Munhall KG, Kasahara Y, Garcia F, Yehia H 1996.b Characterizing audiovisual information during speech. Proceedings of the 4th International Conference on Spoken Language Processing1485–88 Piscataway, NJ: IEEE
    [Google Scholar]
  93. Vivian RM 1966. Tadoma method—tactual approach to speech and speechreading. Volta Rev 68:733–37
    [Google Scholar]
  94. Walker S, Bruce V, O'Malley C 1995. Facial identity and facial speech processing: familiar faces and voices in the McGurk effect. Percept. Psychophys. 57:1124–33
    [Google Scholar]
  95. Wilson EC, Reed CM, Braida LD 2009. Integration of auditory and vibrotactile stimuli: effects of phase and stimulus-onset asynchrony. J. Acoust. Soc. Am. 126:1960–74
    [Google Scholar]
  96. Yehia H, Rubin P, Vatikiotis-Bateson E 1998. Quantitative association of orofacial and vocal-tract shapes. Proceedings of the ESCA Workshop on Audio-Visual Speech Processing41–44 Baixas, Fr.: Int. Speech Commun. Assoc.
    [Google Scholar]
  97. Yeung HH, Werker JF 2013. Lip movements affect infants’ audiovisual speech perception. Psychol. Sci. 24:603–12
    [Google Scholar]
/content/journals/10.1146/annurev-linguistics-011718-012353
Loading
  • Article Type: Review Article
This is a required field
Please enter a valid email address
Approval was a Success
Invalid data
An Error Occurred
Approval was partially successful, following selected items could not be processed due to error