1932

Abstract

Language use in everyday life can be studied using lightweight, wearable recorders that collect long-form recordings—that is, audio (including speech) over whole days. The hardware and software underlying this technique are increasingly accessible and inexpensive, and these data are revolutionizing the language acquisition field. We first place this technique into the broader context of the current ways of studying both the input being received by children and children's own language production, laying out the main advantages and drawbacks of long-form recordings. We then go on to argue that a unique advantage of long-form recordings is that they can fuel realistic models of early language acquisition that use speech to represent children's input and/or to establish production benchmarks. To enable the field to make the most of this unique empirical and conceptual contribution, we outline what this reverse engineering approach from long-form recordings entails, why it is useful, and how to evaluate success.

Loading

Article metrics loading...

/content/journals/10.1146/annurev-linguistics-031120-122120
2022-01-14
2024-04-16
Loading full text...

Full text loading...

/deliver/fulltext/linguistics/8/1/annurev-linguistics-031120-122120.html?itemId=/content/journals/10.1146/annurev-linguistics-031120-122120&mimeType=html&fmt=ahah

Literature Cited

  1. Abu-Zhaya R, Seidl A, Tincoff R, Cristia A. 2017. Building a multimodal lexicon: lessons from infants' learning of body part words. Proceedings of the GLU 2017 International Workshop on Grounding Language Understanding18–21 Grenoble, Fr.: Int. Speech Commun. Assoc.
    [Google Scholar]
  2. Alishahi A, Chrupała G, Cristia A, Dupoux E, Higy B et al. 2021. ZR-2021VG: Zero-Resource Speech Challenge, Visually-Grounded Language Modelling track, 2021 edition arXiv:2107.06546 [cs.CL]
  3. Ambridge B, Lieven E 2015. A constructivist account of child language acquisition. The Handbook of Language Emergence B MacWhinney, W O'Grady 478–510 Chichester, UK: Wiley-Blackwell
    [Google Scholar]
  4. Anderson JR 1975. Computer simulation of a language acquisition system. Information Processing and Cognition: The Loyola Symposium RL Solso 295–349 Hillsdale, NJ: Lawrence Erlbaum
    [Google Scholar]
  5. Athari P, Dey R, Rvachew S. 2021. Vocal imitation between mothers and infants. Infant Behav. Dev. 63:101531
    [Google Scholar]
  6. Bergelson E, Amatuni A, Dailey S, Koorathota S, Tor S 2019. Day by day, hour by hour: naturalistic language input to infants. Dev. Sci. 22:1e12715
    [Google Scholar]
  7. Bergelson E, Swingley D. 2012. At 6–9 months, human infants know the meanings of many common nouns. PNAS 109:93253–58
    [Google Scholar]
  8. Bergmann C, Tsuji S, Piccinini PE, Lewis ML, Braginsky M et al. 2018. Promoting replicability in developmental research through meta-analyses: insights from language acquisition research. Child Dev. 89:61996–2009
    [Google Scholar]
  9. Bosseler AN, Clarke M, Tavabi K, Larson ED, Hippe DS et al. 2021. Using magnetoencephalography to examine word recognition, lateralization, and future language skills in 14-month-old infants. Dev. Cogn. Neurosci. 47:100901
    [Google Scholar]
  10. Braine MD, Bowerman M. 1976. Children's first word combinations. Monogr. Soc. Res. Child Dev. 41:11–104
    [Google Scholar]
  11. Brent MR. 1996. Advances in the computational study of language acquisition. Cognition 61:1–21–38
    [Google Scholar]
  12. Brookman R, Kalashnikova M, Conti J, Xu Rattanasone N, Grant KA et al. 2020. Depression and anxiety in the postnatal period: an examination of infants' home language environment, vocalizations, and expressive language abilities. Child Dev. 91:6e1211–30
    [Google Scholar]
  13. Carbajal MJ, Peperkamp S, Tsuji S 2021. A meta-analysis of infants' word-form recognition. Infancy 26:3369–87
    [Google Scholar]
  14. Casillas M, Brown P, Levinson SC. 2020. Early language experience in a Tseltal Mayan village. Child Dev. 91:51819–35
    [Google Scholar]
  15. Casillas M, Brown P, Levinson SC. 2021. Early language experience in a Papuan community. J. Child Lang. 48:4792–814
    [Google Scholar]
  16. Casillas M, Cristia A. 2019. A step-by-step guide to collecting and analyzing long-format speech environment (LFSE) recordings. Collabra: Psychol. 5:124
    [Google Scholar]
  17. Chrupała G, Gelderloos L, Alishahi A 2017. Representations of language in a model of visually grounded speech signal. Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics 1 italicLong Papers613–22 Stroudsburg, PA: Assoc. Comput. Linguist.
    [Google Scholar]
  18. Cychosz M, Cristia A. 2022. Using big data from long-form recordings to study development and optimize societal impact. Advances in Child Development and Behavior JJ Lockman, R Gilmore 62 Cambridge, MA: Academic. In press
    [Google Scholar]
  19. Cychosz M, Cristia A, Bergelson E, Casillas M, Baudet G et al. 2021. Vocal development in a large-scale crosslinguistic corpus. Dev. Sci. 24:5e13090
    [Google Scholar]
  20. Cychosz M, Romeo R, Soderstrom M, Scaff C, Ganek H et al. 2020. Longform recordings of everyday life: ethics for best practices. Behav. Res. Methods 52:1951–69
    [Google Scholar]
  21. de Boysson-Bardies B, Vihman MM. 1991. Adaptation to language: evidence from babbling and first words in four languages. Language 67:2297–319
    [Google Scholar]
  22. de Seyssel M, Dupoux E. 2020. Does bilingual input hurt? A simulation of language discrimination and clustering using i-vectors. CogSci - 42nd Annual Virtual Meeting of the Cognitive Science Society2791–97 https://cogsci.mindmodeling.org/2020/papers/0683/0683.pdf
    [Google Scholar]
  23. Dupoux E. 2018. Cognitive science in the era of artificial intelligence: a roadmap for reverse-engineering the infant language learner. Cognition 173:43–59
    [Google Scholar]
  24. Fernald A, Zangl R, Portillo AL, Marchman VA 2008. Looking while listening: using eye movements to monitor spoken language. Language Acquisition and Language Disorders 44 Developmental Psycholinguistics: On-line Methods in Children's Language Processing IA Sekerina, EM Fernández, H Clahsen 97–135 Amsterdam: John Benjamins
    [Google Scholar]
  25. Ferry A, Hespos S, Waxman S 2010. Categorization in 3- and 4-month-old infants: an advantage of words over tones. Child Dev. 81:472–79
    [Google Scholar]
  26. Ganek H, Eriks-Brophy A. 2018. Language ENvironment Analysis (LENA) system investigation of day long recordings in children: a literature review. J. Commun. Disord. 72:77–85
    [Google Scholar]
  27. Gasparini L, Langus A, Tsuji S, Boll-Avetisyan N. 2021. Quantifying the role of rhythm in infants' language discrimination abilities: a meta-analysis. Cognition 213:104757
    [Google Scholar]
  28. Gross DR. 1984. Time allocation: a tool for the study of cultural behavior. Annu. Rev. Anthropol. 13:519–58
    [Google Scholar]
  29. Harwath D, Hsu WN, Glass J. 2020. Learning hierarchical discrete linguistic units from visually-grounded speech Paper presented at the 8th International Conference on Learning Representations (ICLR) Addis Ababa, Ethiop., Apr:26–30
  30. Hochmann J-R, Endress A, Mehler J 2010. Word frequency as a cue to identify function words in infancy. Cognition 115:444–57
    [Google Scholar]
  31. Hoff E, Core C, Bridges K. 2008. Non-word repetition assesses phonological memory and is related to vocabulary development in 20- to 24-month-olds. J. Child Lang. 35:4903–16
    [Google Scholar]
  32. Jaeger JJ. 1980. Testing the psychological reality of phonemes. Lang. Speech 23:3233–53
    [Google Scholar]
  33. Jusczyk PW, Luce PA, Charles-Luce J. 1994. Infants' sensitivity to phonotactic patterns in the native language. J. Mem. Lang. 33:5630–45
    [Google Scholar]
  34. Lee GY, Kisilevsky BS. 2014. Fetuses respond to father's voice but prefer mother's voice after birth. Dev. Psychobiol. 56:11–11
    [Google Scholar]
  35. Liaqat D, Wu R, Gershon A, Alshaer H, Rudzicz F, de Lara E 2018. Challenges with real-world smartwatch based audio monitoring. WearSys '18: Proceedings of the 4th ACM Workshop on Wearable Systems and Applications54–59 New York: Assoc. Comput. Mach.
    [Google Scholar]
  36. Long HL, Bowman DD, Yoo H, Burkhardt-Reed MM, Bene ER, Oller DK. 2020. Social and endogenous infant vocalizations. PLOS ONE 15:8e0224956
    [Google Scholar]
  37. MacWhinney B. 2000. The CHILDES Project: The Database 2 New York: Psychol. Press
  38. MacWhinney B 2005. A unified model of language acquisition. Handbook of Bilingualism: Psycholinguistic Approaches JF Kroll, AMB de Groot 49–67 Oxford, UK: Oxford Univ. Press
    [Google Scholar]
  39. May L, Werker J 2014. Can a click be a word?: Infants' learning of non-native words. Infancy 19:3281–300
    [Google Scholar]
  40. Nazzi T, Bertoncini J, Mehler J. 1998. Language discrimination by newborns: toward an understanding of the role of rhythm. J. Exp. Psychol.: Hum. Percept. Perform. 24:3756–66
    [Google Scholar]
  41. Nguyen TA, de Seyssel M, Rozé P, Rivière M, Kharitonov E et al. 2020. The Zero Resource Speech Benchmark 2021: metrics and baselines for unsupervised spoken language modeling. Paper presented at NeurIPS 2020 Virtual Workshop on Self-Supervised Learning for Speech and Audio Processing, Dec. 11
  42. Nielsen M, Haun D, Kärtner J, Legare CH. 2017. The persistent sampling bias in developmental psychology: a call to action. J. Exp. Child Psychol. 162:31–38
    [Google Scholar]
  43. Oller DK, Niyogi P, Gray S, Richards JA, Gilkerson J et al. 2010. Automated vocal analysis of naturalistic recordings from children with autism, language delay, and typical development. PNAS 107:3013354–59
    [Google Scholar]
  44. Orena AJ, Byers-Heinlein K, Polka L. 2020. What do bilingual infants actually hear? Evaluating measures of language input to bilingual-learning 10-month-olds. Dev. Sci. 23:2e12901
    [Google Scholar]
  45. Pagliarini S, Leblois A, Hinaut X 2021. Vocal imitation in sensorimotor learning models: a comparative review. IEEE Trans. Cogn. Dev. Syst. 13:2326–42
    [Google Scholar]
  46. Philippsen A. 2021. Goal-directed exploration for learning vowels and syllables: a computational model of speech acquisition. KI - Künstliche Intell. 35:53–70
    [Google Scholar]
  47. Rasilo H, Räsänen O. 2017. An online model for vowel imitation learning. Speech Commun. 86:1–23
    [Google Scholar]
  48. Robinaugh DJ, Haslbeck JMB, Ryan O, Fried EI, Waldorp LJ 2021. Invisible hands and fine calipers: a call to use formal theory as a toolkit for theory construction. Perspect. Psychol. Sci. 16:4725–43
    [Google Scholar]
  49. Roopnarine JL, Fouts HN, Lamb ME, Lewis-Elligan TY. 2005. Mothers' and fathers' behaviors toward their 3- to 4-month-old infants in lower, middle, and upper socioeconomic African American families. Dev. Psychol. 41:5723–32
    [Google Scholar]
  50. Schatz T, Feldman NH, Goldwater S, Cao X-N, Dupoux E. 2021. Early phonetic learning without phonetic categories: insights from large-scale simulations on realistic input. PNAS 118:7e2001844118
    [Google Scholar]
  51. Schuller B, Batliner A, Bergler C, Pokorny FB, Krajewski J et al. 2019. The INTERSPEECH 2019 computational paralinguistics challenge: Styrian dialects, continuous sleepiness, baby sounds & orca activity. Proceedings of the Annual Conference of the International Speech Communication Association (INTERSPEECH 2019)2378–82 Grenoble, Fr: Int. Speech Commun. Assoc.
    [Google Scholar]
  52. Seidl A, Cristia A, Soderstrom M, Ko ES, Abel EA et al. 2018. Infant–mother acoustic–prosodic alignment and developmental risk. J. Speech Lang. Hear. Res. 61:61369–80
    [Google Scholar]
  53. Shi R, Werker JF, Cutler A. 2006. Recognition and representation of function words in English-learning infants. Infancy 10:2187–98
    [Google Scholar]
  54. Simon DA, Gordon AS, Steiger L, Gilmore RO. 2015. Databrary: enabling sharing and reuse of research video. JCDL '15: Proceedings of the 15th ACM/IEEE-CS Joint Conference on Digital Libraries279–80 New York: Assoc. Comput. Mach.
    [Google Scholar]
  55. Slobin DI. 2014. Before the beginning: the development of tools of the trade. J. Child Lang. 41:S11–17
    [Google Scholar]
  56. Sun J, Harris K, Vazire S 2020. Is well-being associated with the quantity and quality of social interactions?. J. Personal. Soc. Psychol. 119:61478–96
    [Google Scholar]
  57. Tamis-LeMonda CS, Kuchirko Y, Suh DD 2018. Taking center stage: infants' active role in language learning. Active Learning from Infancy to Childhood MM Saylor, PA Gane 39–53 Cham, Switz: Springer
    [Google Scholar]
  58. Turner BO, Paul EJ, Miller MB, Barbey AK. 2018. Small sample sizes reduce the replicability of task-based fMRI studies. Commun. Biol. 1:62
    [Google Scholar]
  59. Twaddell WF. 1935. On defining the phoneme. Language 11:15–62
    [Google Scholar]
  60. VanDam M, Warlaumont AS, Bergelson E, Cristia A, Soderstrom M et al. 2016. HomeBank: an online repository of daylong child-centered audio recordings. Semin. Speech Lang. 37:2128–43
    [Google Scholar]
  61. Vouloumanos A, Waxman SR. 2014. Listen up! Speech is for thinking during infancy. Trends Cogn. Sci. 18:12642–46
    [Google Scholar]
  62. Warlaumont AS, Finnegan MK. 2016. Learning to produce syllabic speech sounds via reward-modulated neural plasticity. PLOS ONE 11:1e0145096
    [Google Scholar]
  63. Warlaumont AS, Westermann G, Oller DK. 2011. Self-production facilitates and adult input interferes in a neural network model of infant vowel imitation Paper presented at AISB 2011: Study of Artificial Intelligence and Simulation of Behaviour York, UK: Apr. 4–7
  64. Weisleder A, Fernald A. 2013. Talking to children matters: Early language experience strengthens processing and builds vocabulary. Psychol. Sci. 24:112143–52
    [Google Scholar]
  65. Werker JF, Tees RC. 1984. Cross-language speech perception: evidence for perceptual reorganization during the first year of life. Infant Behav. Dev. 7:149–63
    [Google Scholar]
  66. Wu R, Liaqat D, de Lara E, Son T, Rudzicz F et al. 2018. Feasibility of using a smartwatch to intensively monitor patients with chronic obstructive pulmonary disease: prospective cohort study. JMIR mHealth uHealth 6:6e10046
    [Google Scholar]
  67. Yamins DL, Hong H, Cadieu CF, Solomon EA, Seibert D, DiCarlo JJ. 2014. Performance-optimized hierarchical models predict neural responses in higher visual cortex. PNAS 111:238619–24
    [Google Scholar]
  68. Yeung HH, Werker J. 2009. Learning words' sounds before learning how words sound: 9-month-olds use distinct objects as cues to categorize speech information. Cognition 113:11234–43
    [Google Scholar]
  69. Yu C 2014. Linking words to world: an embodiment perspective. The Routledge Handbook of Embodied Cognition L Shapiro 139–49 New York: Routledge
    [Google Scholar]
/content/journals/10.1146/annurev-linguistics-031120-122120
Loading
/content/journals/10.1146/annurev-linguistics-031120-122120
Loading

Data & Media loading...

Supplemental Material

Supplementary Data

  • Article Type: Review Article
This is a required field
Please enter a valid email address
Approval was a Success
Invalid data
An Error Occurred
Approval was partially successful, following selected items could not be processed due to error