1932

Abstract

This article surveys the use of natural language in robotics from a robotics point of view. To use human language, robots must map words to aspects of the physical world, mediated by the robot's sensors and actuators. This problem differs from other natural language processing domains due to the need to ground the language to noisy percepts and physical actions. Here, we describe central aspects of language use by robots, including understanding natural language requests, using language to drive learning about the physical world, and engaging in collaborative dialogue with a human partner. We describe common approaches, roughly divided into learning methods, logic-based methods, and methods that focus on questions of human–robot interaction. Finally, we describe several application domains for language-using robots.

Loading

Article metrics loading...

/content/journals/10.1146/annurev-control-101119-071628
2020-05-03
2024-03-28
Loading full text...

Full text loading...

/deliver/fulltext/control/3/1/annurev-control-101119-071628.html?itemId=/content/journals/10.1146/annurev-control-101119-071628&mimeType=html&fmt=ahah

Literature Cited

  1. 1. 
    Takayama L, Ju W, Nass C. 2008. Beyond dirty, dangerous and dull: what everyday people think robots should do. Proceedings of the 3rd ACM/IEEE International Conference on Human Robot Interaction25–32 New York: ACM
    [Google Scholar]
  2. 2. 
    Guizzo E, Ackerman E 2012. The rise of the robot worker. IEEE Spectr. 49:34–41
    [Google Scholar]
  3. 3. 
    Zucker M, Joo S, Grey MX, Rasmussen C, Huang E, et al 2015. A general-purpose system for teleoperation of the DRC-HUBO humanoid robot. J. Field Robot. 32:336–51
    [Google Scholar]
  4. 4. 
    Bohren J, Rusu RB, Jones EG, Marder-Eppstein E, Pantofaru C, et al 2011. Towards autonomous robotic butlers: lessons learned with the PR2. 2011 IEEE International Conference on Robotics and Automation5568–75 Piscataway, NJ: IEEE
    [Google Scholar]
  5. 5. 
    Blank D, Kumar D, Meeden L, Yanco H 2006. The Pyro toolkit for AI and robotics. AI Mag. 27:139–50
    [Google Scholar]
  6. 6. 
    Kress-Gazit H, Fainekos GE 2008. Translating structured English to robot controllers. Adv. Robot. 22:1343–59
    [Google Scholar]
  7. 7. 
    Harnad S 1990. The symbol grounding problem. Phys. D 42:335–46
    [Google Scholar]
  8. 8. 
    Winograd T 1970. Procedures as a representation for data in a computer program for understanding natural language PhD Thesis, Mass. Inst. Technol., Cambridge
  9. 9. 
    Murphy RR, Tadokoro S, Nardi D, Jacoff A, Fiorini P 2008. Search and rescue robotics. Springer Handbook of Robotics B Siciliano, O Khatib1151–73 Berlin: Springer
    [Google Scholar]
  10. 10. 
    Chen DL, Mooney RJ 2011. Learning to interpret natural language navigation instructions from observations. Proceedings of the Twenty-Fifth AAAI Conference on Artificial Intelligence859–65 Palo Alto, CA: AAAI Press
    [Google Scholar]
  11. 11. 
    Tellex S, Kollar T, Shaw G, Roy N, Roy D 2010. Grounding spatial language for video search. International Conference on Multimodal Interfaces and the Workshop on Machine Learning for Multimodal Interaction New York: ACM
    [Google Scholar]
  12. 12. 
    Branavan S, Chen H, Zettlemoyer LS, Barzilay R 2009. Reinforcement learning for mapping instructions to actions. Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP 182–90 Stroudsburg, PA: Assoc. Comput. Linguist.
    [Google Scholar]
  13. 13. 
    Matuszek C, FitzGerald N, Zettlemoyer L, Bo L, Fox D 2012. A joint model of language and perception for grounded attribute learning. Proceedings of the 2012 International Conference on Machine Learning1435–42 Madison, WI: Omnipress
    [Google Scholar]
  14. 14. 
    Kollar T, Tellex S, Roy D, Roy N 2010. Toward understanding natural language directions. Proceedings of the 5th ACM/IEEE International Conference on Human-Robot Interaction259–66 Piscataway, NJ: IEEE
    [Google Scholar]
  15. 15. 
    MacMahon MT 2007. Following natural language route instructions PhD Thesis, Univ. Tex., Austin
  16. 16. 
    Fong T, Nourbakhsh I, Dautenhahn K 2003. A survey of socially interactive robots. Robot. Auton. Syst. 42:143–66
    [Google Scholar]
  17. 17. 
    Goodrich MA, Schultz AC 2007. Human-robot interaction: a survey. Found. Trends Hum.-Comput. Interact. 1:203–75
    [Google Scholar]
  18. 18. 
    Thomaz A, Hoffman G, Cakmak M 2016. Computational human-robot interaction. Found. Trends Robot. 4:105–223
    [Google Scholar]
  19. 19. 
    Mooney RJ 2008. Learning to connect language and perception. Proceedings of the Twenty-Third AAAI Conference on Artificial Intelligence D Fox, CP Gomes1598–601 Palo Alto, CA: AAAI Press
    [Google Scholar]
  20. 20. 
    Heim I, Kratzer A 1998. Semantics in Generative Grammar Oxford, UK: Blackwell
  21. 21. 
    Pinker S 2003. The Language Instinct: How the Mind Creates Language London: Penguin
  22. 22. 
    Hopcroft JE, Motwani R, Ullman JD 2001. Introduction to Automata Theory, Languages, and Computation Boston: Addison-Wesley. 2nd ed.
  23. 23. 
    Sipser M 2006. Introduction to the Theory of Computation Boston: Thomson Course Technol.
  24. 24. 
    Charniak E 2000. A maximum-entropy-inspired parser. Proceedings of the 1st North American Chapter of the Association for Computational Linguistics Conference132–39 Stroudsburg, PA: Assoc. Comput. Linguist.
    [Google Scholar]
  25. 25. 
    Klein D, Manning CD 2003. Accurate unlexicalized parsing. Proceedings of the 41st Annual Meeting on Association for Computational Linguistics 1423–30 Stroudsburg, PA: Assoc. Comput. Linguist.
    [Google Scholar]
  26. 26. 
    Marcus MP, Marcinkiewicz MA, Santorini B 1993. Building a large annotated corpus of English: the Penn Treebank. Comput. Linguist. 19:313–30
    [Google Scholar]
  27. 27. 
    Steedman M 2000. The Syntactic Process Cambridge, MA: MIT Press
  28. 28. 
    Zettlemoyer L, Collins M 2005. Learning to map sentences to logical form: structured classification with probabilistic categorial grammars. . Proceedings of the Twenty-First Conference on Uncertainty in Artificial Intelligence, pp. 658–66 Arlington, VA: AUAI Press
    [Google Scholar]
  29. 29. 
    Artzi Y, Zettlemoyer L 2013. Weakly supervised learning of semantic parsers for mapping instructions to actions. Trans. Assoc. Comput. Linguist. 1:49–62
    [Google Scholar]
  30. 30. 
    Hockenmaier J, Steedman M 2002. Generative models for statistical parsing with combinatory categorial grammar. Proceedings of the 40th Annual Meeting on Association for Computational Linguistics335–42 Stroudsburg, PA: Assoc. Comput. Linguist.
    [Google Scholar]
  31. 31. 
    Dzifcak J, Scheutz M, Baral C, Schermerhorn P 2009. What to do and how to do it: translating natural language directives into temporal and dynamic logic representation for goal management and action execution. 2009 IEEE International Conference on Robotics and Automation4163–68 Piscataway, NJ: IEEE
    [Google Scholar]
  32. 32. 
    Matuszek C, Bo L, Zettlemoyer L, Fox D 2014. Learning from unscripted deictic gesture and language for human-robot interactions. Proceedings of the Twenty-Eighth AAAI Conference on Artificial Intelligence2556–63 Palo Alto, CA: AAAI Press
    [Google Scholar]
  33. 33. 
    Krishnamurthy J, Kollar T 2013. Jointly learning to parse and perceive: connecting natural language to the physical world. Trans. Assoc. Comput. Linguist. 1:193–206
    [Google Scholar]
  34. 34. 
    Jackendoff RS 1983. Semantics of spatial expressions. Semantics and Cognition161–87 Cambridge, MA: MIT Press
    [Google Scholar]
  35. 35. 
    Wierzbicka A 1996. Semantics: Primes and Universals Oxford, UK: Oxford Univ. Press
  36. 36. 
    Talmy L 2005. The fundamental system of spatial schemas in language. . In From Perception to Meaning: Schemas in Cognitive Linguistics B Hamp199–232 Berlin: De Gruyter Mouton
    [Google Scholar]
  37. 37. 
    Huth M, Ryan M 2004. Logic in Computer Science: Modelling and Reasoning About Systems New York: Cambridge Univ. Press
  38. 38. 
    Emerson EA 1990. Temporal and modal logic. Handbook of Theoretical Computer Science B J van Leeuwen995–1072 Cambridge, MA: MIT Press
    [Google Scholar]
  39. 39. 
    Kress-Gazit H, Lahijanian M, Raman V 2018. Synthesis for robots: guarantees and feedback for robot behavior. Annu. Rev. Control Robot. Auton. Syst. 1:211–36
    [Google Scholar]
  40. 40. 
    Manning CD, Schütze H 1999. Foundations of Statistical Natural Language Processing Cambridge, MA: MIT Press
  41. 41. 
    Jurafsky D, Martin J 2008. Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics and Speech Recognition Upper Saddle River, NJ: Prentice Hall. 2nd ed.
  42. 42. 
    Hochreiter S, Schmidhuber J 1997. Long short-term memory. Neural Comput. 9:1735–80
    [Google Scholar]
  43. 43. 
    Mikolov T, Chen K, Corrado G, Dean J 2013. Efficient estimation of word representations in vector space. arXiv:1301.3781 [cs.CL]
    [Google Scholar]
  44. 44. 
    Le Q, Mikolov T 2014. Distributed representations of sentences and documents. Proceedings of the 31st International Conference on International Conference on Machine Learning EP Xing, T Jebara1188–96 Proc. Mach. Learn. Res. Vol. 32. N.p.: PMLR
    [Google Scholar]
  45. 45. 
    Fellbaum C ed. 1998. WordNet: An Electronic Lexical Database Cambridge, MA: MIT Press
  46. 46. 
    Schuler KK 2005. VerbNet: a broad-coverage, comprehensive verb lexicon PhD Thesis, Univ. Pa., Philadelphia
  47. 47. 
    Lignos C, Raman V, Finucane C, Marcus M, Kress-Gazit H 2015. Provably correct reactive control from natural language. Auton. Robots 38:89–105
    [Google Scholar]
  48. 48. 
    Baker CF, Fillmore CJ, Lowe JB 1998. The Berkeley FrameNet project. Proceedings of the 17th International Conference on Computational Linguistics 186–90 Stroudsburg, PA: Assoc. Comput. Linguist.
    [Google Scholar]
  49. 49. 
    Deng J, Dong W, Socher R, Li LJ, Li K, Fei-Fei L 2009. ImageNet: a large-scale hierarchical image database. 2009 IEEE Conference on Computer Vision and Pattern Recognition248–55 Piscataway, NJ: IEEE
    [Google Scholar]
  50. 50. 
    Mao J, Huang J, Toshev A, Camburu O, Yuille AL, Murphy K 2016. Generation and comprehension of unambiguous object descriptions. 2016 IEEE Conference on Computer Vision and Pattern Recognition11–20 Piscataway, NJ: IEEE
    [Google Scholar]
  51. 51. 
    Laird JE 2012. The Soar Cognitive Architecture Cambridge, MA: MIT Press
  52. 52. 
    Trafton JG, Hiatt LM, Harrison AM, Tamborello FP II, Khemlani SS, Schultz AC 2013. ACT-R/E: an embodied cognitive architecture for human-robot interaction. J. Hum.-Robot Interact. 2:30–55
    [Google Scholar]
  53. 53. 
    Schermerhorn PW, Kramer JF, Middendorff C, Scheutz M 2006. DIARC: a testbed for natural human-robot interaction. Proceedings of the 21st National Conference on Artificial Intelligence 21972–73 Palo Alto, CA: AAAI Press
    [Google Scholar]
  54. 54. 
    Scheutz M, Williams T, Krause E, Oosterveld B, Sarathy V, Frasca T 2018. An overview of the distributed integrated cognition affect and reflection DIARC architecture. Cognitive Architectures MIA Ferreira, JS Sequeira, R Ventura165–93 Cham, Switz.: Springer
    [Google Scholar]
  55. 55. 
    Rubinoff R, Lehman JF 1994. Real-time natural language generation in NL-Soar. Proceedings of the Seventh International Workshop on Natural Language Generation199–206 Stroudsburg, PA: Assoc. Comput. Linguist.
    [Google Scholar]
  56. 56. 
    Huffman SB, Laird JE 1993. Learning procedures from interactive natural language instructions. Machine Learning: Proceedings of the Tenth International Conference PE Utgoff143–50 San Francisco: Morgan Kaufmann
    [Google Scholar]
  57. 57. 
    Mohan S, Mininger A, Kirk J, Laird JE 2012. Learning grounded language through situated interactive instruction Tech. Rep. FS-12-07, Assoc. Adv. Artif. Intell. Palo Alto, CA:
  58. 58. 
    Cantrell R, Scheutz M, Schermerhorn P, Wu X 2010. Robust spoken instruction understanding for HRI. Proceedings of the 5th ACM/IEEE International Conference on Human-Robot Interaction275–82 Piscataway, NJ: IEEE
    [Google Scholar]
  59. 59. 
    Cantrell R, Schermerhorn P, Scheutz M 2011. Learning actions from human-robot dialogues. 2011 IEEE International Workshop on Robot and Human Interactive Communication125–30 Piscataway, NJ: IEEE
    [Google Scholar]
  60. 60. 
    Krause E, Zillich M, Williams T, Scheutz M 2014. Learning to recognize novel objects in one shot through human-robot interactions in natural language dialogues. Proceedings of the Twenty-Eighth AAAI Conference on Artificial Intelligence2796–802 Palo Alto, CA: AAAI Press
    [Google Scholar]
  61. 61. 
    Williams T, Briggs G, Oosterveld B, Scheutz M 2015. Going beyond command-based instructions: extending robotic natural language interaction capabilities. Proceedings of the Twenty-Ninth AAAI Conference on Artificial Intelligence1387–93 Palo Alto, CA: AAAI Press
    [Google Scholar]
  62. 62. 
    Williams T, Acharya S, Schreitter S, Scheutz M 2016. Situated open world reference resolution for human-robot dialogue. 2016 11th ACM/IEEE International Conference on Human-Robot Interaction311–18 Piscataway, NJ: IEEE
    [Google Scholar]
  63. 63. 
    Nyga D, Beetz M 2012. Everything robots always wanted to know about housework (but were afraid to ask). 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems243–50 Piscataway, NJ: IEEE
    [Google Scholar]
  64. 64. 
    Simmons RG, Bruce A, Goldberg D, Goode A, Montemerlo M 2003. GRACE and GEORGE: autonomous robots for the AAAI robot challenge Tech. Rep. WS-03-01, Assoc. Adv. Artif. Intell., Palo Alto, CA
  65. 65. 
    Perzanowski D, Schultz AC, Adams W 1998. Integrating natural language and gesture in a robotics domain. Proceedings of the 1998 IEEE International Symposium on Intelligent Control (ISIC) Held Jointly with IEEE International Symposium on Computational Intelligence in Robotics and Automation (CIRA) Intelligent Systems and Semiotics (ISAS)247–52 Piscataway, NJ: IEEE
    [Google Scholar]
  66. 66. 
    MacMahon MT 2006. Walk the talk: connecting language, knowledge, and action in route instructions. Proceedings of the 21st National Conference on Artificial Intelligence 21475–82 Palo Alto, CA: AAAI Press
    [Google Scholar]
  67. 67. 
    Tenorth M, Nyga D, Beetz M 2010. Understanding and executing instructions for everyday manipulation tasks from the world wide web. 2010 IEEE International Conference on Robotics and Automation1486–91 Piscataway, NJ: IEEE
    [Google Scholar]
  68. 68. 
    de Marneffe MC, MacCartney B, Manning CD 2006. Generating typed dependency parses from phrase structure parses. Proceedings of the Fifth International Conference on Language Resources and Evaluation 6449–54 Bern, Switz.: Eur. Lang. Res. Assoc.
    [Google Scholar]
  69. 69. 
    Lenat DB 1995. CYC: a large-scale investment in knowledge infrastructure. Commun. ACM 38:33–38
    [Google Scholar]
  70. 70. 
    Kirk NH, Nyga D, Beetz M 2014. Controlled natural languages for language generation in artificial cognition. 2014 IEEE International Conference on Robotics and Automation6667–72 Piscataway, NJ: IEEE
    [Google Scholar]
  71. 71. 
    Nyga D, Roy S, Paul R, Park D, Pomarlan M 2018. Grounding robot plans from natural language instructions with incomplete world knowledge. Proceedings of the 2nd Conference on Robot Learning A Billard, A Dragan, J Peters, J Morimoto714–23 Proc. Mach. Learn. Res. Vol. 87. N.p.: PMLR
    [Google Scholar]
  72. 72. 
    Raman V, Lignos C, Finucane C, Lee K, Marcus M, Kress-Gazit H 2013. Sorry Dave, I'm afraid I can't do that: explaining unachievable robot tasks using natural language. Robotics: Science and Systems IX P Newman, D Fox, D Hsu pap. 23. N.p.: Robot. Sci. Syst. Found.
    [Google Scholar]
  73. 73. 
    Boteanu A, Howard T, Arkin J, Kress-Gazit H 2016. A model for verifiable grounding and execution of complex natural language instructions. 2016 IEEE/RSJ International Conference on Intelligent Robots and Systems2649–54 Piscataway, NJ: IEEE
    [Google Scholar]
  74. 74. 
    Boteanu A, Arkin J, Patki S, Howard T, Kress-Gazit H 2017. Robot-initiated specification repair through grounded language interaction. Proceedings of the 2017 AAAI Fall Symposium on Natural Communication for Human-Robot Collaboration Palo Alto, CA: AAAI Press
    [Google Scholar]
  75. 75. 
    Howard TM, Tellex S, Roy N 2014. A natural language planner interface for mobile manipulators. 2014 IEEE International Conference on Robotics and Automation6652–59 Piscataway, NJ: IEEE
    [Google Scholar]
  76. 76. 
    Siskind JM 2001. Grounding the lexical semantics of verbs in visual perception using force dynamics and event logic. J. Artif. Intell. Res. 15:31–90
    [Google Scholar]
  77. 77. 
    Matuszek C, Fox D, Koscher K 2010. Following directions using statistical machine translation. Proceedings of the 5th ACM/IEEE International Conference on Human-Robot Interaction251–58 Piscataway, NJ: IEEE
    [Google Scholar]
  78. 78. 
    Tellex S, Thaker P, Deits R, Kollar T, Roy N 2012. Toward information theoretic human-robot dialog. Robotics: Science and Systems VIII N Roy, P Newman, S Srinivasa409–16 Cambridge, MA: MIT Press
    [Google Scholar]
  79. 79. 
    Mei H, Bansal M, Walter MR 2016. Listen, attend, and walk: neural mapping of navigational instructions to action sequences. Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence2772–78 Palo Alto, CA: AAAI Press
    [Google Scholar]
  80. 80. 
    Pillai N, Matuszek C 2018. Unsupervised selection of negative examples for grounded language learning. Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence6517–23 Palo Alto, CA: AAAI Press
    [Google Scholar]
  81. 81. 
    Richards LE, Matuszek C 2019. Learning to understand noncategorical physical language for human-robot interactions Paper presented at the Workshop on AI and Its Alternatives in Assistive and Collaborative Robotics, Robotics: Science and Systems XV, Freiburg, Ger.22–26
  82. 82. 
    Matuszek C, Herbst E, Zettlemoyer L, Fox D 2012. Learning to parse natural language commands to a robot control system. Experimental Robotics: The 13th International Symposium on Experimental Robotics JP Desai, G Dudek, O Khatib, V Kumar403–15 Cham, Switz.: Springer
    [Google Scholar]
  83. 83. 
    Tellex S, Kollar T, Dickerson S, Walter M, Banerjee A, et al 2011. Understanding natural language commands for robotic navigation and mobile manipulation. Proceedings of the Twenty-Fifth AAAI Conference on Artificial Intelligence1507–14 Palo Alto, CA: AAAI Press
    [Google Scholar]
  84. 84. 
    Chen DL, Kim J, Mooney RJ 2010. Training a multilingual sportscaster: using perceptual context to learn language. J. Artif. Intell. Res. 37:397–435
    [Google Scholar]
  85. 85. 
    MacGlashan J, Babeş-Vroman M, desJardins M, Littman M, Muresan S 2015. Grounding English commands to reward functions. Robotics: Science and Systems XI LE Kavraki, D Hsu, J Buchli pap. 18. N.p.: Robot. Sci. Syst. Found.
    [Google Scholar]
  86. 86. 
    Brown PF, Pietra VJD, Pietra SAD, Mercer RL 1993. The mathematics of statistical machine translation: parameter estimation. Comput. Linguist. 19:263–311
    [Google Scholar]
  87. 87. 
    Misra DK, Sung J, Lee K, Saxena A 2014. Tell me Dave: context-sensitive grounding of natural language to mobile manipulation instructions. Robotics: Science and Systems X D Fox, LE Kavraki, H Kurniawati pap. 5. N.p.: Robot. Sci. Syst. Found.
    [Google Scholar]
  88. 88. 
    Thomason J, Zhang S, Mooney R, Stone P 2015. Learning to interpret natural language commands through human-robot dialog. . Proceedings of the Twenty-Fourth International Conference on Artificial Intelligence, pp. 1923–29 Palo Alto, CA: AAAI Press
    [Google Scholar]
  89. 89. 
    Fasola J, Matarić MJ 2013. Using semantic fields to model dynamic spatial relations in a robot architecture for natural language instruction of service robots. 2013 IEEE/RSJ International Conference on Intelligent Robots and Systems143–50 Piscataway, NJ: IEEE
    [Google Scholar]
  90. 90. 
    Brooks DJ, Lignos C, Finucane C, Medvedev MS, Perera I 2012. Make it so: continuous, flexible natural language interaction with an autonomous robot Tech. Rep. WS-12-07, Assoc. Adv. Artif. Intell Palo Alto, CA:
  91. 91. 
    Arumugam D, Karamcheti S, Gopalan N, Wong LLS, Tellex S 2017. Accurately and efficiently interpreting human-robot instructions of varying granularities. Robotics: Science and Systems XIII N Amato, S Srinivasa, N Ayanian, S Kuindersma pap. 56. N.p.: Robot. Sci. Syst. Found.
    [Google Scholar]
  92. 92. 
    Huang AS, Tellex S, Bachrach A, Kollar T, Roy D, Roy N 2010. Natural language command of an autonomous micro-air vehicle. 2010 IEEE/RSJ International Conference on Intelligent Robots and Systems2663–69 Piscataway, NJ: IEEE
    [Google Scholar]
  93. 93. 
    Deits R, Tellex S, Thaker P, Simeonov D, Kollar T, Roy N 2013. Clarifying commands with information-theoretic human-robot dialog. J. Hum.-Robot Interact. 2:58–79
    [Google Scholar]
  94. 94. 
    Tellex S, Knepper R, Li A, Rus D, Roy N 2014. Asking for help using inverse semantics. Robotics: Science and Systems X D Fox, LE Kavraki, H Kurniawati pap. 24. N.p.: Robot. Sci. Syst. Found.
    [Google Scholar]
  95. 95. 
    Paul R, Arkin J, Roy N, Howard TM 2016. Efficient grounding of abstract spatial concepts for natural language interaction with robot manipulators. Robotics: Science and Systems XII D Hsu, N Amato, S Berman, S Jacobs pap. 37. N.p.: Robot. Sci. Syst. Found.
    [Google Scholar]
  96. 96. 
    Roy DK, Pentland AP 2002. Learning words from sights and sounds: a computational model. Cogn. Sci. 26:113–46
    [Google Scholar]
  97. 97. 
    Guadarrama S, Rodner E, Saenko K, Darrell T 2015. Understanding object descriptions in robotics by open-vocabulary object retrieval and detection. Int. J. Robot. Res. 35:265–80
    [Google Scholar]
  98. 98. 
    Blukis V, Misra D, Knepper RA, Artzi Y 2018. Mapping navigation instructions to continuous control actions with position-visitation prediction. Proceedings of the 2nd Conference on Robot Learning A Billard, A Dragan, J Peters, J Morimoto505–18 Proc. Mach. Learn. Res. Vol. 87. N.p.: PMLR
    [Google Scholar]
  99. 99. 
    Billard A, Dautenhahn K, Hayes G 1998. Experiments on human-robot communication with Robota, an imitative learning and communicating doll robot. Socially Situated Intelligence: A Workshop Held at SAB’98, August 1998, Zürich B Edmonds, K Dautenhahn4–16 Zürich: Univ. Zürich
    [Google Scholar]
  100. 100. 
    Cangelosi A, Hourdakis E, Tikhanoff V 2006. Language acquisition and symbol grounding transfer with neural networks and cognitive robots. The 2006 IEEE International Joint Conference on Neural Network Proceedings1576–82 Piscataway, NJ: IEEE
    [Google Scholar]
  101. 101. 
    Ahn H, Ha T, Choi Y, Yoo H, Oh S 2018. Text2Action: generative adversarial synthesis from language to action. 2018 IEEE International Conference on Robotics and Automation5915–20 Piscataway, NJ: IEEE
    [Google Scholar]
  102. 102. 
    Gopalan N, Arumugam D, Wong L, Tellex S 2018. Sequence-to-sequence language grounding of non-Markovian task specifications. Robotics: Science and Systems XIV H Kress-Gazit, S Srinivasa, T Howard, N Atanasov pap. 67. N.p.: Robot. Sci. Syst. Found.
    [Google Scholar]
  103. 103. 
    Andreas J, Klein D 2015. Alignment-based compositional semantics for instruction following. Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing1165–74 Stroudsburg, PA: Assoc. Comput. Linguist.
    [Google Scholar]
  104. 104. 
    Das A, Datta S, Gkioxari G, Lee S, Parikh D, Batra D 2018. Embodied question answering. 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition1–10 Piscataway, NJ: IEEE
    [Google Scholar]
  105. 105. 
    Hermann KM, Hill F, Green S, Wang F, Faulkner R 2017. Grounded language learning in a simulated 3D world. arXiv:1706.06551 [cs.CL]
    [Google Scholar]
  106. 106. 
    Misra D, Langford J, Artzi Y 2017. Mapping instructions and visual observations to actions with reinforcement learning. Proceedings of the Conference on Empirical Methods in Natural Language Processing1004–15 Stroudsburg, PA: Assoc. Comput. Linguist.
    [Google Scholar]
  107. 107. 
    Krizhevsky A, Sutskever I, Hinton GE 2012. ImageNet classification with deep convolutional neural networks. Advances in Neural Information Processing Systems 25 F Pereira, CJC Burges, L Bottou, KQ Weinberger1097–105 Red Hook, NY: Curran
    [Google Scholar]
  108. 108. 
    Johnson J, Karpathy A, Fei-Fei L 2016. DenseCap: fully convolutional localization networks for dense captioning. 2016 IEEE Conference on Computer Vision and Pattern Recognition4565–74 Piscataway, NJ: IEEE
    [Google Scholar]
  109. 109. 
    Karpathy A, Fei-Fei L 2015. Deep visual-semantic alignments for generating image descriptions. 2015 IEEE Conference on Computer Vision and Pattern Recognition3128–37 Piscataway, NJ: IEEE
    [Google Scholar]
  110. 110. 
    Shridhar M, Hsu D 2018. Interactive visual grounding of referring expressions for human-robot interaction. Robotics: Science and Systems XIV H Kress-Gazit, S Srinivasa, T Howard, N Atanasov pap. 28. N.p.: Robot. Sci. Syst. Found.
    [Google Scholar]
  111. 111. 
    Hatori J, Kikuchi Y, Kobayashi S, Takahashi K, Tsuboi Y, et al 2018. Interactively picking real-world objects with unconstrained spoken language instructions. 2018 IEEE International Conference on Robotics and Automation3774–81 Piscataway, NJ: IEEE
    [Google Scholar]
  112. 112. 
    Steels L, Kaplan F 2000. AIBO's first words: the social learning of language and meaning. Evol. Commun. 4:3–32
    [Google Scholar]
  113. 113. 
    Chao C, Cakmak M, Thomaz AL 2011. Towards grounding concepts for transfer in goal learning from demonstration. 2011 IEEE International Conference on Development and Learning 2: Piscataway, NJ: IEEE https://doi.org/10.1109/DEVLRN.2011.6037321
    [Crossref] [Google Scholar]
  114. 114. 
    Krening S, Harrison B, Feigh KM, Isbell CL, Riedl M, Thomaz A 2016. Learning from explanations using sentiment and advice in RL. IEEE Trans. Cogn. Dev. Syst. 9:44–55
    [Google Scholar]
  115. 115. 
    Forbes M, Rao RP, Zettlemoyer L, Cakmak M 2015. Robot programming by demonstration with situated spatial language understanding. 2015 IEEE International Conference on Robotics and Automation2014–20 Piscataway, NJ: IEEE
    [Google Scholar]
  116. 116. 
    Artzi Y, Forbes M, Lee K, Cakmak M 2014. Programming by demonstration with situated semantic parsing. Artificial Intelligence for Human-Robot Interaction: Papers from the 2014 AAAI Fall Symposium33–35 Palo Alto, CA: AAAI Press
    [Google Scholar]
  117. 117. 
    Peng B, Loftin R, MacGlashan J, Littman ML, Taylor ME, Roberts DL 2015. Language and policy learning from human-delivered feedback Paper presented at the Machine Learning for Social Robotics Workshop, 2015 International Conference on Robotics and Automation Seattle, WA:26–30
  118. 118. 
    Loftin R, Peng B, MacGlashan J, Littman ML, Taylor ME, et al 2014. Learning something from nothing: leveraging implicit human feedback strategies. The 23rd IEEE International Symposium on Robot and Human Interactive Communication607–12 Piscataway, NJ: IEEE
    [Google Scholar]
  119. 119. 
    Loftin R, Peng B, MacGlashan J, Littman ML, Taylor ME, et al 2016. Learning behaviors via human-delivered discrete feedback: modeling implicit feedback strategies to speed up learning. Auton. Agents Multi-Agent Syst. 30:30–59
    [Google Scholar]
  120. 120. 
    Thomaz AL, Breazeal C 2008. Teachable robots: understanding human teaching behavior to build more effective robot learners. Artif. Intell. 172:716–37
    [Google Scholar]
  121. 121. 
    Cakmak M, Chao C, Thomaz AL 2010. Designing interactions for robot active learners. IEEE Trans. Auton. Mental Dev. 2:108–18
    [Google Scholar]
  122. 122. 
    Thomason J, Padmakumar A, Sinapov J, Hart J, Stone P, Mooney RJ 2017. Opportunistic active learning for grounding natural language descriptions. Proceedings of the 1st Annual Conference on Robot Learning67–76 Proc. Mach. Learn. Res. Vol. 78. N.p.: PMLR
    [Google Scholar]
  123. 123. 
    Padmakumar A, Stone P, Mooney RJ 2018. Learning a policy for opportunistic active learning. Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing1347–57 Stroudsburg, PA: Assoc. Comput. Linguist.
    [Google Scholar]
  124. 124. 
    Knox WB, Stone P, Breazeal C 2013. Training a robot via human feedback: a case study. Social Robotics G Herrmann, MJ Pearson, A Lenz, P Bremner, A Spiers, U Leonards460–70 Cham, Switz.: Springer
    [Google Scholar]
  125. 125. 
    Pillai N, Budhraja KK, Matuszek C 2016. Improving grounded language acquisition efficiency using interactive labeling Paper presented at the Workshop on Planning for Human-Robot Interaction: Shared Autonomy and Collaborative Robotics, Robotics: Science and Systems XII Ann Arbor, MI:18–22
  126. 126. 
    Kulick J, Toussaint M, Lang T, Lopes M 2013. Active learning for teaching a robot grounded relational symbols. Proceedings of the Twenty-Third International Joint Conference on Artificial Intelligence1451–57 Palo Alto, CA: AAAI Press
    [Google Scholar]
  127. 127. 
    Paul R, Arkin J, Aksaray D, Roy N, Howard TM 2018. Efficient grounding of abstract spatial concepts for natural language interaction with robot platforms. Int. J. Robot. Res. 37:1269–99
    [Google Scholar]
  128. 128. 
    Hayes B, Scassellati B 2014. Discovering task constraints through observation and active learning. 2014 IEEE/RSJ International Conference on Intelligent Robots and Systems4442–49 Piscataway, NJ: IEEE
    [Google Scholar]
  129. 129. 
    Chao C, Thomaz A 2016. Timed Petri nets for fluent turn-taking over multimodal interaction resources in human-robot collaboration. Int. J. Robot. Res. 35:1330–53
    [Google Scholar]
  130. 130. 
    Chao C, Lee J, Begum M, Thomaz AL 2011. Simon plays Simon says: the timing of turn-taking in an imitation game. 2011 IEEE International Workshop on Robot and Human Communication235–40 Piscataway, NJ: IEEE
    [Google Scholar]
  131. 131. 
    Cakmak M, Thomaz AL 2012. Designing robot learners that ask good questions. Proceedings of the Seventh Annual ACM/IEEE International Conference on Human-Robot Interaction17–24 New York: ACM
    [Google Scholar]
  132. 132. 
    Toh E, Poh L, Causo A, Tzuo PW, Chen I, et al 2016. A review on the use of robots in education and young children. J. Educ. Technol. Soc. 19:148–63
    [Google Scholar]
  133. 133. 
    Leite I, McCoy M, Lohani M, Ullman D, Salomons N, et al 2015. Emotional storytelling in the classroom: individual versus group interaction between children and robots. Proceedings of the Tenth Annual ACM/IEEE International Conference on Human-Robot Interaction75–82 New York: ACM
    [Google Scholar]
  134. 134. 
    Gold K, Doniec M, Crick C, Scassellati B 2009. Robotic vocabulary building using extension inference and implicit contrast. Artif. Intell. 173:145–66
    [Google Scholar]
  135. 135. 
    Breazeal C, Harris PL, DeSteno D, Kory Westlund JM, Dickens L, Jeong S 2016. Young children treat robots as informants. Top. Cogn. Sci. 8:481–91
    [Google Scholar]
  136. 136. 
    Kory Westlund JM, Dickens L, Jeong S, Harris P, DeSteno D, Breazeal C 2015. A comparison of children learning new words from robots, tablets, and people. Proceedings of New Friends 2015: The 1st International Conference on Social Robots in Therapy and Education M Heerink, M de Jong26–28 Almere, Neth.: Windesheim Flevoland
    [Google Scholar]
  137. 137. 
    van den Berghe R, Verhagen J, Oudgenoeg-Paz O, van der Ven S, Leseman P 2018. Social robots for language learning: a review. Rev. Educ. Res. 89:259–95
    [Google Scholar]
  138. 138. 
    Ramachandran A, Huang CM, Gartland E, Scassellati B 2018. Thinking aloud with a tutoring robot to enhance learning. Proceedings of the 2018 ACM/IEEE International Conference on Human-Robot Interaction59–68 New York: ACM
    [Google Scholar]
  139. 139. 
    Clabaugh C, Ragusa G, Sha F, Matarić M 2015. Designing a socially assistive robot for personalized number concepts learning in preschool children. 2015 Joint IEEE International Conference on Development and Learning and Epigenetic Robotics314–19 Piscataway, NJ: IEEE
    [Google Scholar]
  140. 140. 
    Mutlu B, Shiwa T, Kanda T, Ishiguro H, Hagita N 2009. Footing in human-robot conversations: how robots might shape participant roles using gaze cues. Proceedings of the 4th ACM/IEEE International Conference on Human-Robot Interaction61–68 New York: ACM
    [Google Scholar]
  141. 141. 
    Mavridis N 2015. A review of verbal and non-verbal human–robot interactive communication. Robot. Auton. Syst. 63:22–35
    [Google Scholar]
  142. 142. 
    Pejsa T, Andrist S, Gleicher M, Mutlu B 2015. Gaze and attention management for embodied conversational agents. ACM Trans. Interact. Intell. Syst. 5:3
    [Google Scholar]
  143. 143. 
    Görer B, Salah AA, Akın HL 2017. An autonomous robotic exercise tutor for elderly people. Auton. Robots 41:657–78
    [Google Scholar]
  144. 144. 
    Fasola J, Matarić MJ 2015. Evaluation of a spatial language interpretation framework for natural human-robot interaction with older adults. 2015 24th IEEE International Symposium on Robot and Human Interactive Communication301–8 Piscataway, NJ: IEEE
    [Google Scholar]
  145. 145. 
    Fasola J, Matarić MJ 2014. Interpreting instruction sequences in spatial language discourse with pragmatics towards natural human-robot interaction. 2014 IEEE International Conference on Robotics and Automation2720–27 Piscataway, NJ: IEEE
    [Google Scholar]
  146. 146. 
    Fasola J, Matarić MJ 2012. Using socially assistive human–robot interaction to motivate physical exercise for older adults. Proc. IEEE 100:2512–26
    [Google Scholar]
  147. 147. 
    Brawer J, Mangin O, Roncone A, Widder S, Scassellati B 2018. Situated human–robot collaboration: predicting intent from grounded natural language. 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems827–33 Piscataway, NJ: IEEE
    [Google Scholar]
  148. 148. 
    Scassellati B, Brawer J, Tsui K, Nasihati Gilani S, Malzkuhn M 2018. Teaching language to deaf infants with a robot and a virtual human. Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems New York: ACM
    [Google Scholar]
  149. 149. 
    Shimizu N, Haas A 2009. Learning to follow navigational route instructions. Proceedings of the Twenty-First International Joint Conference on Artificial Intelligence1488–93 Palo Alto, CA: AAAI Press
    [Google Scholar]
  150. 150. 
    Blisard SN, Skubic M 2005. Modeling spatial referencing language for human-robot interaction. IEEE International Workshop on Robot and Human Interactive Communication698–703 Piscataway, NJ: IEEE
    [Google Scholar]
  151. 151. 
    Mooney RJ 2019. A review of work on natural language navigation instructions Invited talk presented at the Combined Workshop on Spatial Language Understanding and Grounded Communication for Robotics, Annual Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies Minneapolis, MN:
  152. 152. 
    Bollini M, Tellex S, Thompson T, Roy N, Rus D 2013. Interpreting and executing recipes with a cooking robot. Experimental Robotics J Desai, G Dudek, O Khatib, V Kumar481–95 Heidelberg, Ger.: Springer
    [Google Scholar]
  153. 153. 
    Beetz M, Klank U, Kresse I, Maldonado A, Mosenlechner L, et al 2011. Robotic roommates making pancakes. 2011 11th IEEE-RAS International Conference on Humanoid Robots529–36 Piscataway, NJ: IEEE
    [Google Scholar]
  154. 154. 
    Correa A, Walter MR, Fletcher L, Glass J, Teller S, Davis R 2010. Multimodal interaction with an autonomous forklift. Proceeding of the 5th ACM/IEEE International Conference on Human-Robot Interaction243–50 New York: ACM
    [Google Scholar]
  155. 155. 
    Tasse D, Smith NA 2008. Sour cream: toward semantic processing of recipes Tech. Rep. CMU-LTI-08-005, Carnegie Mellon Univ. Pittsburgh, PA:
  156. 156. 
    Kiddon C, Ponnuraj GT, Zettlemoyer L, Choi Y 2015. Mise en place: unsupervised interpretation of instructional recipes. Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing982–92 Stroudburg, PA: Assoc. Comput. Linguist.
    [Google Scholar]
  157. 157. 
    Cantrell R, Talamadupula K, Schermerhorn P, Benton J, Kambhampati S, Scheutz M 2012. Tell me when and why to do it! Run-time planner model updates via natural language instruction. 2012 7th ACM/IEEE International Conference on Human-Robot Interaction471–78 Piscataway, NJ: IEEE
    [Google Scholar]
  158. 158. 
    Walter M, Hemachandra S, Homberg B, Tellex S, Teller S 2013. Learning semantic maps from natural language descriptions. Robotics: Science and Systems IX P Newman, D Fox, D Hsu pap. 4. N.p.: Robot. Sci. Syst. Found.
    [Google Scholar]
  159. 159. 
    Pronobis A, Jensfelt P 2012. Large-scale semantic mapping and reasoning with heterogeneous modalities. 2012 IEEE International Conference on Robotics and Automation3515–22 Piscataway, NJ: IEEE
    [Google Scholar]
  160. 160. 
    Bálint-Benczédi F, Mania P, Beetz M 2016. Scaling perception towards autonomous object manipulation—in knowledge lies the power. 2016 IEEE International Conference on Robotics and Automation5774–81 Piscataway, NJ: IEEE
    [Google Scholar]
  161. 161. 
    Saunders J, Lehmann H, Förster F, Nehaniv CL 2012. Robot acquisition of lexical meaning-moving towards the two-word stage. 2012 IEEE International Conference on Development and Learning and Epigenetic Robotics Piscataway, NJ: IEEE https://doi.org/10.1109/DevLrn.2012.6400588
    [Crossref] [Google Scholar]
  162. 162. 
    Lai K, Bo L, Ren X, Fox D 2012. Detection-based object labeling in 3D scenes. 2012 IEEE International Conference on Robotics and Automation1330–37 Piscataway, NJ: IEEE
    [Google Scholar]
  163. 163. 
    Xiao J, Hays J, Ehinger KA, Oliva A, Torralba A 2010. Sun database: large-scale scene recognition from abbey to zoo. 2010 IEEE conference on Computer Vision and Pattern Recognition3485–92 Piscataway, NJ: IEEE
    [Google Scholar]
  164. 164. 
    Knepper RA, Layton T, Romanishin J, Rus D 2013. IkeaBot: an autonomous multi-robot coordinated furniture assembly system. 2013 IEEE International Conference on Robotics and Automation855–62 Piscataway, NJ: IEEE
    [Google Scholar]
  165. 165. 
    Chai JY, Hong P, Zhou MX 2004. A probabilistic approach to reference resolution in multimodal user interfaces. Proceedings of the 9th International Conference on Intelligent User Interfaces70–77 New York: ACM
    [Google Scholar]
  166. 166. 
    Whitney D, Eldon M, Oberlin J, Tellex S 2016. Interpreting multimodal referring expressions in real time. 2016 IEEE International Conference on Robotics and Automation3331–38 Piscataway, NJ: IEEE
    [Google Scholar]
  167. 167. 
    Golland D, Liang P, Klein D 2010. A game-theoretic approach to generating spatial descriptions. Proceedings of the 2010 Conference on Empirical Methods in Natural LanguageProcessing410–19 Stroudsburg, PA: Assoc. Comput. Linguist.
    [Google Scholar]
  168. 168. 
    Landau B, Jackendoff R 1993. “What” and “where” in spatial language and spatial cognition. Behav. Brain Sci. 16:217–65
    [Google Scholar]
  169. 169. 
    Alomari M, Duckworth P, Hogg DC, Cohn AG 2017. Natural language acquisition and grounding for embodied robotic systems. Thirty-First AAAI Conference on Artificial Intelligence4349–56 Palo Alto, CA: AAAI Press
    [Google Scholar]
  170. 170. 
    Tellex S, Roy D 2009. Grounding spatial prepositions for video search. Proceedings of the International Conference on Multimodal Interfaces253–60 New York: ACM
    [Google Scholar]
  171. 171. 
    Veloso MM, Biswas J, Coltin B, Rosenthal S 2015. CoBots: robust symbiotic autonomous mobile service robots. Proceedings of the 24th International Conference on Artificial Intelligence4423–29 Palo Alto, CA: AAAI Press
    [Google Scholar]
  172. 172. 
    Rosenthal S, Veloso M 2011. Modeling humans as observation providers using POMDPs. IEEE International Workshop on Robot and Human Communication53–58 Piscataway, NJ: IEEE
    [Google Scholar]
  173. 173. 
    Thomason J, Sinapov J, Mooney RJ, Stone P 2018. Guiding exploratory behaviors for multi-modal grounding of linguistic descriptions. Proceedings of the 32nd National Conference on Artificial Intelligence5520–27 Palo Alto, CA: AAAI Press
    [Google Scholar]
  174. 174. 
    Thomason J, Sinapov J, Svetlik M, Stone P, Mooney R 2016. Learning multi-modal grounded linguistic semantics by playing “I spy. .” In Proceedings of the Twenty-Fifth International Joint Conference on Artificial Intelligence, pp. 3477–83 Palo Alto, CA: AAAI Press
    [Google Scholar]
  175. 175. 
    Mutlu B, Forlizzi J, Hodgins J 2006. A storytelling robot: modeling and evaluation of human-like gaze behavior. 2006 6th IEEE-RAS International Conference on Humanoid Robots518–23 Piscataway, NJ: IEEE
    [Google Scholar]
  176. 176. 
    Cascianelli S, Costante G, Ciarfuglia TA, Valigi P, Fravolini ML 2018. Full-GRU natural language video description for service robotics applications. IEEE Robot. Autom. Lett. 3:841–48
    [Google Scholar]
  177. 177. 
    Dale R, Reiter E 1995. Computational interpretations of the Gricean maxims in the generation of referring expressions. Cogn. Sci. 19:233–63
    [Google Scholar]
  178. 178. 
    Grice H 1975. Logic and conversation. Syntax and Semantics, Vol. 3: Speech Acts P Cole, JL Morgan41–58 New York: Academic
    [Google Scholar]
  179. 179. 
    Mitchell M, Van Deemter K, Reiter E 2013. Generating expressions that refer to visible objects. Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies1174–84 Stroudsburg, PA: Assoc. Comput. Linguist.
    [Google Scholar]
  180. 180. 
    Fang R, Doering M, Chai JY 2015. Embodied collaborative referring expression generation in situated human-robot interaction. Proceedings of the Tenth Annual ACM/IEEE International Conference on Human-Robot Interaction271–78 New York: ACM
    [Google Scholar]
  181. 181. 
    Zender H, Kruijff GJM, Kruijff-Korbayová I 2009. Situated resolution and generation of spatial referring expressions for robotic assistants. Proceedings of the 21st International Joint Conference on Artificial Intelligence1604–9 Palo Alto, CA: AAAI Press
    [Google Scholar]
  182. 182. 
    Bohus D, Horvitz E 2010. Facilitating multiparty dialog with gaze, gesture, and speech. International Conference on Multimodal Interfaces and the Workshop on Machine Learning for Multimodal Interaction New York: ACM
    [Google Scholar]
  183. 183. 
    Okuno Y, Kanda T, Imai M, Ishiguro H, Hagita N 2009. Providing route directions: design of robot's utterance, gesture, and timing. 2009 4th ACM/IEEE International Conference on Human-Robot Interaction53–60 Piscataway, NJ: IEEE
    [Google Scholar]
  184. 184. 
    Veloso M, Biswas J, Coltin B, Rosenthal S, Kollar T, et al 2012. CoBots: collaborative robots servicing multi-floor buildings. 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems5446–47 Piscataway, NJ: IEEE
    [Google Scholar]
  185. 185. 
    Marge M, Powers A, Brookshire J, Jay T, Jenkins OC, Geyer C 2011. Comparing heads-up, hands-free operation of ground robots to teleoperation. Robotics: Science and Systems VII H Durrant-Whyte, N Roy, P Abbeel193–200 Cambridge, MA: MIT Press
    [Google Scholar]
  186. 186. 
    Tse R, Campbell ME 2015. Human-robot information sharing with structured language generation from probabilistic beliefs. 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems1242–48 Piscataway, NJ: IEEE
    [Google Scholar]
  187. 187. 
    Chai JY, Gao Q, She L, Yang S, Saba-Sadiya S, Xu G 2018. Language to action: towards interactive task learning with physical agents. Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence2–9 Palo Alto, CA: AAAI Press
    [Google Scholar]
  188. 188. 
    Kollar T, Perera V, Nardi D, Veloso M 2013. Learning environmental knowledge from task-based human-robot dialog. 2013 IEEE International Conference on Robotics and Automation4304–9 Piscataway, NJ: IEEE
    [Google Scholar]
  189. 189. 
    Suhr A, Lewis M, Yeh J, Artzi Y 2017. A corpus of natural language for visual reasoning. Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics 2217–23 Stroudsburg, PA: Assoc. Comput. Linguist.
    [Google Scholar]
  190. 190. 
    Johnson J, Hariharan B, van der Maaten L, Fei-Fei L, Zitnick CL, Girshick R 2017. CLEVR: a diagnostic dataset for compositional language and elementary visual reasoning. 2017 IEEE Conference on Computer Vision and Pattern Recognition1988–97 Piscataway, NJ: IEEE
    [Google Scholar]
  191. 191. 
    Gordon D, Kembhavi A, Rastegari M, Redmon J, Fox D, Farhadi A 2018. IQA: visual question answering in interactive environments. 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition4089–98 Piscataway, NJ: IEEE
    [Google Scholar]
  192. 192. 
    Anderson P, Wu Q, Teney D, Bruce J, Johnson M, et al 2018. Vision-and-language navigation: interpreting visually-grounded navigation instructions in real environments. 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition3674–83 Piscataway, NJ: IEEE
    [Google Scholar]
  193. 193. 
    Blukis V, Brukhim N, Bennett A, Knepper RA, Artzi Y 2018. Following high-level navigation instructions on a simulated quadcopter with imitation learning. Robotics: Science and Systems XIV H Kress-Gazit, S Srinivasa, T Howard, N Atanasov pap. 66. N.p.: Robot. Sci. Syst. Found.
    [Google Scholar]
/content/journals/10.1146/annurev-control-101119-071628
Loading
/content/journals/10.1146/annurev-control-101119-071628
Loading

Data & Media loading...

  • Article Type: Review Article
This is a required field
Please enter a valid email address
Approval was a Success
Invalid data
An Error Occurred
Approval was partially successful, following selected items could not be processed due to error