Although the study of human-robot interaction has made great strides in recent years, robots are still far from being able to interact autonomously with humans in unstructured human environments. If we are to bring robots into our homes and offices, it is not likely that their builders will be able to engineer appropriate behaviors for every circumstance they might encounter, nor will the robots come with convenient Wizards of Oz to teleoperate them at every turn.

In order to perform multiple complex unforeseen tasks, robots will need to learn tasks taught to them by users who are not themselves roboticists or programmers. One promising way to accomplish this learning is to demonstrate desired tasks to a robot observer. This approach has several advantages: the teacher needs no expertise in programming the robot, nor any pedagogical insight into the learning process. Nothing is required of the user beyond the ability to complete the task, in some fashion that the robot can interpret.

However, this places the onus on inference by the learner. Interpreting and learning from the demonstration is by no means straightforward. A human teacher and robotic student have very different physical affordances and sensory modalities. The problem of how to establish a trustworthy mapping between the two is very much still an open question.

This work studies the mismatch between human and robot perceptual abilities. Although a robot with a camera may be able to detect, identify and locate a few particularly salient
visual cues, no existing algorithms can possibly make sense of all of the context, spatial relationships, identities and qualities of every object in a scene, all of which are apparent at a glance to a human. Thus, a robot’s attempt to learn a policy may be thwarted by the
simple fact that the human makes decisions based on features of the environment which the robot’s sensory apparatus is currently inadequate to perceive.

One aspect of solving this problem relies on mediating the perceptual mismatch between humans and robots. We report on solutions to this problem in . Furthermore, we believe that previous learning from demonstration projects have been thwarted by insufficient data collection, and are building interfaces, remote laboratories and infrastructure to support the collection of demonstration data on a truly large scale.

Video: Human and robot perception in large-scale learning from demonstration