Reverse Engineering Language Acquisition with Child-Centered Long-Form Recordings

Marvin Lavechin; Maureen de Seyssel; Lucas Gautheron; Emmanuel Dupoux; Alejandrina Cristia

doi:10.1146/annurev-linguistics-031120-122120

Reverse Engineering Language Acquisition with Child-Centered Long-Form Recordings

Marvin Lavechin^1,2,3, Maureen de Seyssel^1,2,4, Lucas Gautheron¹, Emmanuel Dupoux^1,2,3, and Alejandrina Cristia¹
View Affiliations Hide Affiliations

Affiliations: ¹Laboratoire de Sciences Cognitives et Psycholinguistique, Département d’Etudes cognitives, ENS, EHESS, CNRS, PSL University, Paris, France; email: [email protected][email protected][email protected][email protected][email protected] ²Cognitive Machine Learning Team, INRIA, Paris, France ³Facebook AI Research, Paris, France ⁴Laboratoire de linguistique formelle, Université de Paris, CNRS, Paris, France
Vol. 8:389-407 (Volume publication date January 2022) https://doi.org/10.1146/annurev-linguistics-031120-122120
First published as a Review in Advance on November 15, 2021
Copyright © 2022 by Annual Reviews. All rights reserved

Abstract

Language use in everyday life can be studied using lightweight, wearable recorders that collect long-form recordings—that is, audio (including speech) over whole days. The hardware and software underlying this technique are increasingly accessible and inexpensive, and these data are revolutionizing the language acquisition field. We first place this technique into the broader context of the current ways of studying both the input being received by children and children's own language production, laying out the main advantages and drawbacks of long-form recordings. We then go on to argue that a unique advantage of long-form recordings is that they can fuel realistic models of early language acquisition that use speech to represent children's input and/or to establish production benchmarks. To enable the field to make the most of this unique empirical and conceptual contribution, we outline what this reverse engineering approach from long-form recordings entails, why it is useful, and how to evaluate success.

Keyword(s): computational studies, ecological validity, language acquisition, LENA, long-form recordings, reverse engineering

Article metrics loading...

/content/journals/10.1146/annurev-linguistics-031120-122120

2022-01-14

2024-04-16

Full text loading...

/deliver/fulltext/linguistics/8/1/annurev-linguistics-031120-122120.html?itemId=/content/journals/10.1146/annurev-linguistics-031120-122120&mimeType=html&fmt=ahah

Literature Cited

Abu-Zhaya R, Seidl A, Tincoff R, Cristia A. 2017. Building a multimodal lexicon: lessons from infants' learning of body part words. Proceedings of the GLU 2017 International Workshop on Grounding Language Understanding18–21 Grenoble, Fr.: Int. Speech Commun. Assoc.
[Google Scholar]
Alishahi A, Chrupała G, Cristia A, Dupoux E, Higy B et al. 2021. ZR-2021VG: Zero-Resource Speech Challenge, Visually-Grounded Language Modelling track, 2021 edition arXiv:2107.06546 [cs.CL]
Ambridge B, Lieven E 2015. A constructivist account of child language acquisition. The Handbook of Language Emergence B MacWhinney, W O'Grady 478–510 Chichester, UK: Wiley-Blackwell
[Google Scholar]
Anderson JR 1975. Computer simulation of a language acquisition system. Information Processing and Cognition: The Loyola Symposium RL Solso 295–349 Hillsdale, NJ: Lawrence Erlbaum
[Google Scholar]
Athari P, Dey R, Rvachew S. 2021. Vocal imitation between mothers and infants. Infant Behav. Dev. 63:101531
[Google Scholar]
Bergelson E, Amatuni A, Dailey S, Koorathota S, Tor S 2019. Day by day, hour by hour: naturalistic language input to infants. Dev. Sci. 22:1e12715
[Google Scholar]
Bergelson E, Swingley D. 2012. At 6–9 months, human infants know the meanings of many common nouns. PNAS 109:93253–58
[Google Scholar]
Bergmann C, Tsuji S, Piccinini PE, Lewis ML, Braginsky M et al. 2018. Promoting replicability in developmental research through meta-analyses: insights from language acquisition research. Child Dev. 89:61996–2009
[Google Scholar]
Bosseler AN, Clarke M, Tavabi K, Larson ED, Hippe DS et al. 2021. Using magnetoencephalography to examine word recognition, lateralization, and future language skills in 14-month-old infants. Dev. Cogn. Neurosci. 47:100901
[Google Scholar]
Braine MD, Bowerman M. 1976. Children's first word combinations. Monogr. Soc. Res. Child Dev. 41:11–104
[Google Scholar]
Brent MR. 1996. Advances in the computational study of language acquisition. Cognition 61:1–21–38
[Google Scholar]
Brookman R, Kalashnikova M, Conti J, Xu Rattanasone N, Grant KA et al. 2020. Depression and anxiety in the postnatal period: an examination of infants' home language environment, vocalizations, and expressive language abilities. Child Dev. 91:6e1211–30
[Google Scholar]
Carbajal MJ, Peperkamp S, Tsuji S 2021. A meta-analysis of infants' word-form recognition. Infancy 26:3369–87
[Google Scholar]
Casillas M, Brown P, Levinson SC. 2020. Early language experience in a Tseltal Mayan village. Child Dev. 91:51819–35
[Google Scholar]
Casillas M, Brown P, Levinson SC. 2021. Early language experience in a Papuan community. J. Child Lang. 48:4792–814
[Google Scholar]
Casillas M, Cristia A. 2019. A step-by-step guide to collecting and analyzing long-format speech environment (LFSE) recordings. Collabra: Psychol. 5:124
[Google Scholar]
Chrupała G, Gelderloos L, Alishahi A 2017. Representations of language in a model of visually grounded speech signal. Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics 1 italicLong Papers613–22 Stroudsburg, PA: Assoc. Comput. Linguist.
[Google Scholar]
Cychosz M, Cristia A. 2022. Using big data from long-form recordings to study development and optimize societal impact. Advances in Child Development and Behavior JJ Lockman, R Gilmore 62 Cambridge, MA: Academic. In press
[Google Scholar]
Cychosz M, Cristia A, Bergelson E, Casillas M, Baudet G et al. 2021. Vocal development in a large-scale crosslinguistic corpus. Dev. Sci. 24:5e13090
[Google Scholar]
Cychosz M, Romeo R, Soderstrom M, Scaff C, Ganek H et al. 2020. Longform recordings of everyday life: ethics for best practices. Behav. Res. Methods 52:1951–69
[Google Scholar]
de Boysson-Bardies B, Vihman MM. 1991. Adaptation to language: evidence from babbling and first words in four languages. Language 67:2297–319
[Google Scholar]
de Seyssel M, Dupoux E. 2020. Does bilingual input hurt? A simulation of language discrimination and clustering using i-vectors. CogSci - 42nd Annual Virtual Meeting of the Cognitive Science Society2791–97 https://cogsci.mindmodeling.org/2020/papers/0683/0683.pdf
[Google Scholar]
Dupoux E. 2018. Cognitive science in the era of artificial intelligence: a roadmap for reverse-engineering the infant language learner. Cognition 173:43–59
[Google Scholar]
Fernald A, Zangl R, Portillo AL, Marchman VA 2008. Looking while listening: using eye movements to monitor spoken language. Language Acquisition and Language Disorders 44 Developmental Psycholinguistics: On-line Methods in Children's Language Processing IA Sekerina, EM Fernández, H Clahsen 97–135 Amsterdam: John Benjamins
[Google Scholar]
Ferry A, Hespos S, Waxman S 2010. Categorization in 3- and 4-month-old infants: an advantage of words over tones. Child Dev. 81:472–79
[Google Scholar]
Ganek H, Eriks-Brophy A. 2018. Language ENvironment Analysis (LENA) system investigation of day long recordings in children: a literature review. J. Commun. Disord. 72:77–85
[Google Scholar]
Gasparini L, Langus A, Tsuji S, Boll-Avetisyan N. 2021. Quantifying the role of rhythm in infants' language discrimination abilities: a meta-analysis. Cognition 213:104757
[Google Scholar]
Gross DR. 1984. Time allocation: a tool for the study of cultural behavior. Annu. Rev. Anthropol. 13:519–58
[Google Scholar]
Harwath D, Hsu WN, Glass J. 2020. Learning hierarchical discrete linguistic units from visually-grounded speech Paper presented at the 8th International Conference on Learning Representations (ICLR) Addis Ababa, Ethiop., Apr:26–30
Hochmann J-R, Endress A, Mehler J 2010. Word frequency as a cue to identify function words in infancy. Cognition 115:444–57
[Google Scholar]
Hoff E, Core C, Bridges K. 2008. Non-word repetition assesses phonological memory and is related to vocabulary development in 20- to 24-month-olds. J. Child Lang. 35:4903–16
[Google Scholar]
Jaeger JJ. 1980. Testing the psychological reality of phonemes. Lang. Speech 23:3233–53
[Google Scholar]
Jusczyk PW, Luce PA, Charles-Luce J. 1994. Infants' sensitivity to phonotactic patterns in the native language. J. Mem. Lang. 33:5630–45
[Google Scholar]
Lee GY, Kisilevsky BS. 2014. Fetuses respond to father's voice but prefer mother's voice after birth. Dev. Psychobiol. 56:11–11
[Google Scholar]
Liaqat D, Wu R, Gershon A, Alshaer H, Rudzicz F, de Lara E 2018. Challenges with real-world smartwatch based audio monitoring. WearSys '18: Proceedings of the 4th ACM Workshop on Wearable Systems and Applications54–59 New York: Assoc. Comput. Mach.
[Google Scholar]
Long HL, Bowman DD, Yoo H, Burkhardt-Reed MM, Bene ER, Oller DK. 2020. Social and endogenous infant vocalizations. PLOS ONE 15:8e0224956
[Google Scholar]
MacWhinney B. 2000. The CHILDES Project: The Database 2 New York: Psychol. Press
MacWhinney B 2005. A unified model of language acquisition. Handbook of Bilingualism: Psycholinguistic Approaches JF Kroll, AMB de Groot 49–67 Oxford, UK: Oxford Univ. Press
[Google Scholar]
May L, Werker J 2014. Can a click be a word?: Infants' learning of non-native words. Infancy 19:3281–300
[Google Scholar]
Nazzi T, Bertoncini J, Mehler J. 1998. Language discrimination by newborns: toward an understanding of the role of rhythm. J. Exp. Psychol.: Hum. Percept. Perform. 24:3756–66
[Google Scholar]
Nguyen TA, de Seyssel M, Rozé P, Rivière M, Kharitonov E et al. 2020. The Zero Resource Speech Benchmark 2021: metrics and baselines for unsupervised spoken language modeling. Paper presented at NeurIPS 2020 Virtual Workshop on Self-Supervised Learning for Speech and Audio Processing, Dec. 11
Nielsen M, Haun D, Kärtner J, Legare CH. 2017. The persistent sampling bias in developmental psychology: a call to action. J. Exp. Child Psychol. 162:31–38
[Google Scholar]
Oller DK, Niyogi P, Gray S, Richards JA, Gilkerson J et al. 2010. Automated vocal analysis of naturalistic recordings from children with autism, language delay, and typical development. PNAS 107:3013354–59
[Google Scholar]
Orena AJ, Byers-Heinlein K, Polka L. 2020. What do bilingual infants actually hear? Evaluating measures of language input to bilingual-learning 10-month-olds. Dev. Sci. 23:2e12901
[Google Scholar]
Pagliarini S, Leblois A, Hinaut X 2021. Vocal imitation in sensorimotor learning models: a comparative review. IEEE Trans. Cogn. Dev. Syst. 13:2326–42
[Google Scholar]
Philippsen A. 2021. Goal-directed exploration for learning vowels and syllables: a computational model of speech acquisition. KI - Künstliche Intell. 35:53–70
[Google Scholar]
Rasilo H, Räsänen O. 2017. An online model for vowel imitation learning. Speech Commun. 86:1–23
[Google Scholar]
Robinaugh DJ, Haslbeck JMB, Ryan O, Fried EI, Waldorp LJ 2021. Invisible hands and fine calipers: a call to use formal theory as a toolkit for theory construction. Perspect. Psychol. Sci. 16:4725–43
[Google Scholar]
Roopnarine JL, Fouts HN, Lamb ME, Lewis-Elligan TY. 2005. Mothers' and fathers' behaviors toward their 3- to 4-month-old infants in lower, middle, and upper socioeconomic African American families. Dev. Psychol. 41:5723–32
[Google Scholar]
Schatz T, Feldman NH, Goldwater S, Cao X-N, Dupoux E. 2021. Early phonetic learning without phonetic categories: insights from large-scale simulations on realistic input. PNAS 118:7e2001844118
[Google Scholar]
Schuller B, Batliner A, Bergler C, Pokorny FB, Krajewski J et al. 2019. The INTERSPEECH 2019 computational paralinguistics challenge: Styrian dialects, continuous sleepiness, baby sounds & orca activity. Proceedings of the Annual Conference of the International Speech Communication Association (INTERSPEECH 2019)2378–82 Grenoble, Fr: Int. Speech Commun. Assoc.
[Google Scholar]
Seidl A, Cristia A, Soderstrom M, Ko ES, Abel EA et al. 2018. Infant–mother acoustic–prosodic alignment and developmental risk. J. Speech Lang. Hear. Res. 61:61369–80
[Google Scholar]
Shi R, Werker JF, Cutler A. 2006. Recognition and representation of function words in English-learning infants. Infancy 10:2187–98
[Google Scholar]
Simon DA, Gordon AS, Steiger L, Gilmore RO. 2015. Databrary: enabling sharing and reuse of research video. JCDL '15: Proceedings of the 15th ACM/IEEE-CS Joint Conference on Digital Libraries279–80 New York: Assoc. Comput. Mach.
[Google Scholar]
Slobin DI. 2014. Before the beginning: the development of tools of the trade. J. Child Lang. 41:S11–17
[Google Scholar]
Sun J, Harris K, Vazire S 2020. Is well-being associated with the quantity and quality of social interactions?. J. Personal. Soc. Psychol. 119:61478–96
[Google Scholar]
Tamis-LeMonda CS, Kuchirko Y, Suh DD 2018. Taking center stage: infants' active role in language learning. Active Learning from Infancy to Childhood MM Saylor, PA Gane 39–53 Cham, Switz: Springer
[Google Scholar]
Turner BO, Paul EJ, Miller MB, Barbey AK. 2018. Small sample sizes reduce the replicability of task-based fMRI studies. Commun. Biol. 1:62
[Google Scholar]
Twaddell WF. 1935. On defining the phoneme. Language 11:15–62
[Google Scholar]
VanDam M, Warlaumont AS, Bergelson E, Cristia A, Soderstrom M et al. 2016. HomeBank: an online repository of daylong child-centered audio recordings. Semin. Speech Lang. 37:2128–43
[Google Scholar]
Vouloumanos A, Waxman SR. 2014. Listen up! Speech is for thinking during infancy. Trends Cogn. Sci. 18:12642–46
[Google Scholar]
Warlaumont AS, Finnegan MK. 2016. Learning to produce syllabic speech sounds via reward-modulated neural plasticity. PLOS ONE 11:1e0145096
[Google Scholar]
Warlaumont AS, Westermann G, Oller DK. 2011. Self-production facilitates and adult input interferes in a neural network model of infant vowel imitation Paper presented at AISB 2011: Study of Artificial Intelligence and Simulation of Behaviour York, UK: Apr. 4–7
Weisleder A, Fernald A. 2013. Talking to children matters: Early language experience strengthens processing and builds vocabulary. Psychol. Sci. 24:112143–52
[Google Scholar]
Werker JF, Tees RC. 1984. Cross-language speech perception: evidence for perceptual reorganization during the first year of life. Infant Behav. Dev. 7:149–63
[Google Scholar]
Wu R, Liaqat D, de Lara E, Son T, Rudzicz F et al. 2018. Feasibility of using a smartwatch to intensively monitor patients with chronic obstructive pulmonary disease: prospective cohort study. JMIR mHealth uHealth 6:6e10046
[Google Scholar]
Yamins DL, Hong H, Cadieu CF, Solomon EA, Seibert D, DiCarlo JJ. 2014. Performance-optimized hierarchical models predict neural responses in higher visual cortex. PNAS 111:238619–24
[Google Scholar]
Yeung HH, Werker J. 2009. Learning words' sounds before learning how words sound: 9-month-olds use distinct objects as cues to categorize speech information. Cognition 113:11234–43
[Google Scholar]
Yu C 2014. Linking words to world: an embodiment perspective. The Routledge Handbook of Embodied Cognition L Shapiro 139–49 New York: Routledge
[Google Scholar]

/content/journals/10.1146/annurev-linguistics-031120-122120

Reverse Engineering Language Acquisition with Child-Centered Long-Form Recordings

Annual Review of Linguistics 8, 389 (2022); https://doi.org/10.1146/annurev-linguistics-031120-122120

/content/journals/10.1146/annurev-linguistics-031120-122120

Data & Media loading...

Supplemental Material

Supplementary Data

Download the Supplemental Appendix (PDF).

Article Type: Review Article

Most Cited Most Cited RSS feed

- Bilingualism, Mind, and Brain
  
  Judith F. Kroll, Paola E. Dussias, Kinsey Bice, and Lauren Perrotti
  
  Vol. 1 (2015), pp. 377–394
- How Nature Meets Nurture: Universal Grammar and Statistical Learning
  
  Jeffrey Lidz, and Annie Gagliardi
  
  Vol. 1 (2015), pp. 333–353
- The Indo-European Homeland from Linguistic and Archaeological Perspectives
  
  David W. Anthony, and Don Ringe
  
  Vol. 1 (2015), pp. 199–219
- Sign Language Typology: The Contribution of Rural Sign Languages
  
  Connie de Vos, and Roland Pfau
  
  Vol. 1 (2015), pp. 265–288
- Correlational Studies in Typological and Historical Linguistics
  
  D. Robert Ladd, Seán G. Roberts, and Dan Dediu
  
  Vol. 1 (2015), pp. 221–241
- Advances in Dialectometry
  
  Martijn Wieling, and John Nerbonne
  
  Vol. 1 (2015), pp. 243–264
- Genetics and the Language Sciences
  
  Simon E. Fisher, and Sonja C. Vernes
  
  Vol. 1 (2015), pp. 289–310
- Ditransitive Constructions
  
  Martin Haspelmath
  
  Vol. 1 (2015), pp. 19–41
- Language Abilities in Neanderthals
  
  Sverker Johansson
  
  Vol. 1 (2015), pp. 311–332
- Diachronic Semantics
  
  Ashwini Deo
  
  Vol. 1 (2015), pp. 179–197
More Less

Annual Review of Linguistics

Volume 8, 2022

Review Article

Free

Reverse Engineering Language Acquisition with Child-Centered Long-Form Recordings

Abstract

Supplementary Data

Most Read This Month

Most Cited Most Cited RSS feed

Bilingualism, Mind, and Brain

How Nature Meets Nurture: Universal Grammar and Statistical Learning

The Indo-European Homeland from Linguistic and Archaeological Perspectives

Sign Language Typology: The Contribution of Rural Sign Languages

Correlational Studies in Typological and Historical Linguistics

Advances in Dialectometry

Genetics and the Language Sciences

Ditransitive Constructions

Language Abilities in Neanderthals

Diachronic Semantics