Learning Robot Controllers from Demonstration

At its core, robot LfD concerns to inferring desired decision policy π(s) → a for some unknown task, as a latent mapping in the mind of a human user, from observations of actions take by the user across various states. In other words, the robot’s estimated policy should match the user’s intended decision making π as closely as possible, assuming similar situation awareness, fixed perception of state s, and motion control over actions a.

Our recent work approaches this problem using interactive demonstration and, thus, casts robot control policy estimation as a form of supervised regression, or function approximation [Grollman and Jenkins 2008 ICRA]. Here, demonstration data are observations of state-action pairs (si,ai) gathered as the robot is teleoperated by a user to perform the task. Experimenting in the domain of robot soccer, my research group has used sparse and incremental nonparametric regression, specifically Sparse Online Gaussian Processes [Csato and Opper 2002 Neural Computation], for direct policy LfD. Similar to other nonparametric regression approaches to LfD [Atkeson and Schaal 1998 ICML], our work has shown that individual skills, as a constant unique mapping from robot state to immediate objective, can be readily learned in this manner, such as approaching a ball, trapping a ball, or shooting a ball. Our most interesting finding, however, was that learning of generic controllers, such as assembly tasks for a mobile manipulator, is an ill-defined regression problem due to perceptual aliasing, where the policy is a non-unique function in the face of multiple skill-level objects. That is, a controller comprised of sequencing multiple skills can be a multivalued mapping when different skills map a given state to distinctly different actions.

To address this LfD problem, we have developed models and algorithms for sparse and incremental infinite mixtures of Gaussian Process regression experts [Grollman and Jenkins 2010 IROS] to learn multiple skill-level policies in demonstration, task-level topologies describing transitional dynamics between skills, and using reinforcement learning to autonomously refine individual skills. We have been able to learn these models using particle filters, enabling much more practical inference for robotics applications, as well as demonstrate robot goal scoring with completely learned primitive policies.


D. Grollman and O. Jenkins, “Dogged Learning for Robots,” in International Conference on Robotics and Automation (ICRA 2007), Rome, Italy, 2007, pp. 2483-2488.

D. Grollman and O. Jenkins, “Sparse Incremental Learning for Interactive Robot Control Policy Estimation,” in International Conference on Robotics and Automation (ICRA 2008), Pasadena, CA, USA, 2008, pp. 3315-3320.

D. Grollman and O. C. Jenkins, “Can We Learn Finite State Machine Robot Controllers from Interactive Demonstration?” Springer, 2009, pp. 405-429.

D. Grollman and O. C. Jenkins, “Incremental Learning of Subtasks from Unsegmented Demonstration,” in International Conference on Intelligent Robots and Systems (IROS 2010), Taipei, Taiwan, 2010, pp. 261-266.

J. Butterfield, S. Osentoski, G. Jay, and O. Jenkins, “Learning from Demonstration using a Multi-valued Function Regressor for Time-series Data,” in IEEE-RAS International Conference on Humanoid Robots (Humanoids 2010), 2010, pp. 328-333.