Rosglue documentation

From Brown University Robotics

(Difference between revisions)
Jump to: navigation, search
m (Using rosglue)
m (Using rosglue)
Line 75: Line 75:
   problemtype: episodic
   problemtype: episodic
   discountfactor: 1
   discountfactor: 1
-
 
   observations:  
   observations:  
     /position:
     /position:
Line 87: Line 86:
       - -1.6
       - -1.6
       - 3.5
       - 3.5
-
 
   actions:  
   actions:  
       /act:
       /act:
Line 108: Line 106:
   termination:
   termination:
       type: glue #ros or glue
       type: glue #ros or glue
-
 
   extra: iRobotCreate by Sarah Osentoski
   extra: iRobotCreate by Sarah Osentoski
This file creates the corresponding taskspec: "VERSION RL-Glue-3.0 PROBLEMTYPE episodic DISCOUNTFACTOR 1 OBSERVATIONS DOUBLES (2 -2.0 10.0) (-2.0 2.0) ACTIONS INTS (0 3)  REWARDS (-1.0 10.0) EXTRA iRobotCreate by Sarah Osentoski."
This file creates the corresponding taskspec: "VERSION RL-Glue-3.0 PROBLEMTYPE episodic DISCOUNTFACTOR 1 OBSERVATIONS DOUBLES (2 -2.0 10.0) (-2.0 2.0) ACTIONS INTS (0 3)  REWARDS (-1.0 10.0) EXTRA iRobotCreate by Sarah Osentoski."

Revision as of 03:59, 10 August 2010

Contents

Introduction

Reinforcement learning (RL) is a sub-area of machine learning concerned with how an agent should select actions given its environment and often cites robotics as a potential application area. While some researchers have bridged the gap and used RL algorithms in robot applications, most RL experiments happen in simulation and are never ported over to the robotics due to the difficulty of programming and maintaining a robot. Additionally individuals in the RL community use their own software frameworks for evaluating and creating learning techniques. RL-Glue is a standard interface that allows RL researchers to share agent, environments and experiment programs together. Robotics has suffered from similar problems where labs have primarily created their own infrastructure and evaluation across different techniques has become difficult if not impossible. ROS is a large sophisticated research tool that is currently be used by many roboticists world-wide.

We introduce rosglue, framework that allows robots running ROS to be environments for RL-Glue agents. Our hope is that this may lead to increased communication between the fields and open further collaborations.

Short Primer on ROS

ROS is an open-source robot middle ware system. It provides many services including hardware abstraction, low-level device control, implementations for commonly used functionality, and message-passing. If you're familiar with ROS, feel free to skim or skip this section. If you've never heard of ROS before or know very little about it you can learn more by checking out the tutorials and documentation on http://www.ros.org/wiki/ However, our goal is to allow you to use at least some robots running ROS with as little understanding of this as possible.

Topics and Services

Perhaps the most important thing to understand about ROS is how it exposes the functionality of the robot. This happens in one of two ways, as a topic or as a service. Both services and topics can be used for observing the robots environment or for performing control.

Topics are an asynchronous communication of streams of objects. A process can publish topics and other processes may subscribe to these topics and use the data as they wish without directly communicating to the publisher process.

Services are a synchronous communication system and are much like function calls in many programming languages, they take in arguments and return responses. Services, under ROS, will always return an object which can be arbitrarily complex.

Example

Short Primer on RL-Glue

RL-Glue provides a standard interface for the three major components of an RL system: the agent, the environment, and the experiment. Much like with ROS you're familiar with RL-Glue, feel free to skim or skip this section. If you've never heard of RL-Glue before or know very little about it you can learn more by checking out http://glue.rl-community.org/wiki/Main_Page

In order to program in RL-Glue developers download a codec for the language of the user's choice, currently C/C++, Java, Lisp, Matlab, and Python are supported. The RL-Glue interface is a series of functions that are defined by the codec.

  ADD EXAMPLE HERE

These functions define a low level protocol for connecting agents, envirionments and experiments. The developers fill in the functions with the desired functionality and then rl_glue passes the messages between the agent, environment and experiment. Users can also use the RL-Library [1], an open-source collection RL-Glue compatible agent, environments, and experiments.

One of the most important things to understand about RL-Glue is the Task Spec. The task speck is essentially the problem definition in RL-Glue. The task spec follows the following template:

 VERSION <version-name> PROBLEMTYPE <problem-type> DISCOUNTFACTOR <discount-factor> 
 OBSERVATIONS INTS ([times-to-repeat-this-tuple=1] <min-value> <max-value>)* 
 DOUBLES ([times-to-repeat-this-tuple=1] <min-value> <max-value>)* 
 CHARCOUNT <char-count> ACTIONS INTS ([times-to-repeat-this-tuple=1] <min-value> <max-value>)*
DOUBLES ([times-to-repeat-this-tuple=1] <min-value> <max-value>)* 
 CHARCOUNT <char-count> REWARDS (<min-value> <max-value>) 

EXTRA [extra text of your choice goes here]";

  • VERSION - refers to the RL-Glue version. rosglue is only guaranteed to work with RL-Glue version 3.0
  • PROBLEMTYPE - episodic or continuing
  • DISCOUNTFACTOR - a number between 0 and 1
  • OBSERVATIONS - the observations from the environment
  • ACTIONS - the actions the agent will take
  • times to repeat this tuple - is the number of times a tuple will be rpeated. You can write (3 0 1) rather than (0 1) (0 1) (0 1)
  • char-count is the size of the character array


An example of a task speck for RL-Glue is:

 VERSION RL-Glue-3.0 PROBLEMTYPE episodic 
 DISCOUNTFACTOR 1 OBSERVATIONS INTS (2 0 1) 
 DOUBLES (3 -2 0.5) (-.5 .5) ACTIONS INTS (0 4) 
 REWARDS (-5.0 5.0) EXTRA additional notes go here (for exampe author and problem name) 

This defines the learning problem as:

  • an episodic learning problem
  • discount with a value of 1
  • 2 dimensional integer observations all either 0 or 1
  • 3 dimensional continuous obsevations 2 between -2 and .5 and the thrid is between -.5 and .5
  • 1 dimensional integer action with values 0, 1, 2, 3
  • A minimum reward -5 and a maximum reward of 5


A more indepth discussion of the task spec can be found here: [2]

Using rosglue

rosglue is designed to be a bridge between RL-Glue and ROS. As pictured in the figure, rosglue treats a robot running ROS as an RL-Glue environment.

Rather than coding up an RL-Glue environment the user defines the robotics envirionment through a yaml file. This yaml is similar to the RL-Glue task spec.


This example yaml file

 problemtype: episodic
 discountfactor: 1
 observations: 
   /position:
     x:
      - -0.05
      - 3.0
     y:
      - -0.05
      - 3.0
     theta: 
      - -1.6
      - 3.5
 actions: 
     /act:
     - service
     - action:
         - 0
         - 2 
 reward:
    type: glue #ros or glue
    /position:
     x:
      - -0.05
      - 3.0
     y:
      - -0.05
       - 3.0
    range: 
      - -1
      - 10
 termination:
     type: glue #ros or glue
 extra: iRobotCreate by Sarah Osentoski


This file creates the corresponding taskspec: "VERSION RL-Glue-3.0 PROBLEMTYPE episodic DISCOUNTFACTOR 1 OBSERVATIONS DOUBLES (2 -2.0 10.0) (-2.0 2.0) ACTIONS INTS (0 3) REWARDS (-1.0 10.0) EXTRA iRobotCreate by Sarah Osentoski."