Rosglue documentation

From Brown University Robotics

Jump to: navigation, search

Contents

Introduction

Reinforcement learning (RL) is a sub-area of machine learning concerned with how an agent should select actions given its environment and often cites robotics as a potential application area. While some researchers have bridged the gap and used RL algorithms in robot applications, most RL experiments happen in simulation and are never ported over to a robot due to programming and maintenance difficulties. Additionally, individuals in the RL community use their own software frameworks for evaluating and creating learning techniques. Robotics has suffered from similar problems where labs have primarily created their own infrastructure and evaluation across different techniques has become difficult if not impossible. RL-Glue is a standard interface that allows RL researchers to share agent, environments and experiment programs together. ROS is likewise a middle-ware tool for robotics. The use of middleware systems like ROS and RL-Glue, and the more prevalent availability of off the shelf robots presents an opportunity for RL algorithms to be used for robotic tasks.

We introduce rosglue, a framework that allows robots running ROS to be environments for RL-Glue agents. Our hope is that this may lead to increased communication between the fields and open further collaborations. rosglue is designed to be a bridge between RL-Glue and ROS. A high level visualization of this framework is shown below.


Short Primer on ROS

ROS is an open-source robot middle-ware system. It provides many services including hardware abstraction, low-level device control, implementations for commonly used functionality, and message-passing. If you're familiar with ROS, feel free to skim or skip this section. If you've never heard of ROS before or know very little about it you can learn more by checking out the tutorials and documentation on http://www.ros.org/wiki/ The goal of this project is to allow you to begin RL-Glue with a robot running ROS quickly.

Topics and Services

Perhaps the most important thing to understand about ROS is how it exposes the functionality of the robot. This happens in one of two ways: as a topic or as a service. Both services and topics can be used for observing the robots environment or for performing control.

Topics are an asynchronous communication of streams of objects. A process can publish topics and other processes may subscribe to these topics and use the data as they wish without directly communicating to the publisher process. Available topics can be viewed and examined through the use of of the rostopic command: http://www.ros.org/wiki/rostopic

Services are a synchronous communication system and are much like function calls in many programming languages, they take in arguments and return responses. Services, under ROS, will always return an object which can be arbitrarily complex. Available topics can be viewed and examined through the use of of the rostopic command: http://www.ros.org/wiki/rosservice

Short Primer on RL-Glue

RL-Glue provides a standard interface for the three major components of an RL system: the agent, the environment, and the experiment. Much like with ROS you're familiar with RL-Glue, feel free to skim or skip this section. If you've never heard of RL-Glue before or know very little about it you can learn more by checking out http://glue.rl-community.org/wiki/Main_Page

In order to program in RL-Glue developers download a codec for the language of the user's choice, currently C/C++, Java, Lisp, Matlab, and Python are supported. The RL-Glue interface is a series of functions that are defined by the codec. For example a standard python RL-Glue environment the following functions must be defined:

  • env_init: this function is called once, the first time the environment is used. The function returns the problem definition or the Task Spec (which we will describe in more detail later in this section).
  • env_start: this function is called every time the environment is reinitialized or a new episode begins and returns the current observation vector
  • env_step: this function takes in an action, performs the action, and returns an observation
  • env_cleanup: this function takes care of freeing up memory once the experiment is complete
  • env_message: this function takes a message as input. This can be used to customize an environment to handle particularities that may not have been previously anticipated


These functions define a low level protocol for connecting agents, envirionments and experiments. The developers fill in the functions with the desired functionality and then rl_glue passes the messages between the agent, environment and experiment. Users can also use the RL-Library [1], an open-source collection RL-Glue compatible agent, environments, and experiments.

One of the most important things to understand about RL-Glue is the Task Spec. The task speck is the problem definition in RL-Glue. The task spec follows the following template:

 VERSION <version-name> PROBLEMTYPE <problem-type> DISCOUNTFACTOR <discount-factor> 
 OBSERVATIONS INTS ([times-to-repeat-this-tuple=1] <min-value> <max-value>)* 
 DOUBLES ([times-to-repeat-this-tuple=1] <min-value> <max-value>)* 
 CHARCOUNT <char-count> ACTIONS INTS ([times-to-repeat-this-tuple=1] <min-value> <max-value>)*
DOUBLES ([times-to-repeat-this-tuple=1] <min-value> <max-value>)* 
 CHARCOUNT <char-count> REWARDS (<min-value> <max-value>) 
EXTRA [extra text of your choice goes here]"; 
  • VERSION - refers to the RL-Glue version. rosglue is only guaranteed to work with RL-Glue version 3.0
  • PROBLEMTYPE - episodic or continuing
  • DISCOUNTFACTOR - a number between 0 and 1
  • OBSERVATIONS - the observations from the environment
  • ACTIONS - the actions the agent will take
  • times to repeat this tuple - is the number of times a tuple will be rpeated. You can write (3 0 1) rather than (0 1) (0 1) (0 1)
  • char-count is the size of the character array


An example of a task speck for RL-Glue is:

 VERSION RL-Glue-3.0 PROBLEMTYPE episodic 
 DISCOUNTFACTOR 1 OBSERVATIONS INTS (2 0 1) 
 DOUBLES (3 -2 0.5) (-.5 .5) ACTIONS INTS (0 4) 
 REWARDS (-5.0 5.0) EXTRA additional notes go here (for exampe author and problem name) 

This defines the learning problem as:

  • an episodic learning problem
  • discount with a value of 1
  • 2 dimensional integer observations all either 0 or 1
  • 3 dimensional continuous obsevations 2 between -2 and .5 and the thrid is between -.5 and .5
  • 1 dimensional integer action with values 0, 1, 2, 3
  • A minimum reward -5 and a maximum reward of 5


A more indepth discussion of the task spec can be found here: [2]

Using rosglue

rosglue is designed to be a bridge between RL-Glue and ROS. rosglue treats a robot running ROS as an RL-Glue environment.

Observations

Currently rosglue allows observations to be ROS topics. Multiple topics can be used to make up the observation vector. The user decides which topics they would like to use as observations for the learning algorithm and specifies them through a yaml file that we will describe in more detail below.

Actions

Actions may either be ROS topics, published by rosglue, or ROS services. They are specified in a similar manner to observations

Reward and Termination Conditions

Reward and termination functions may either be ROS topics or custom python functions created by the user.

Custom Functions

The reward function is named get_reward and is passed a state variable. The function accesses the state through the topic and fields. For example if the robot is using the '/position' topic as part of its observation vector and you wish to give reward when the robot reaches a certain position in x dimension the current observation value for the x field would access it in the following way:

  state['/position']['x']

The reward function is expected to be a one dimensional reward.


The termination function works in a similar manner and is called check_termination_condition. This function returns a boolean value indicating if a termination condition has been met.

When the user creates one of these custom functions they must use rosparam to let rosglue know which file to use.

For example if the function can be found in the mycode_reward.py file in the same directory as rosglue the following call will be issued:

 rosparam set /brown/rosglue/rewardfile mycode_reward.py

or

 rosparam set /brown/rosglue/terminationfile mycode_termination.py


ROS Topics for Reward/Termination

ROS topics can also be used to specify reward and termination functions. For example if the robot is using the /position topic as part of its observation vector and you wish the agent to move to higher values in the x dimension, you could allow the x field of the /position topic to also be the reward function and indicate this in the Yaml file. The reward topic must still be a one dimensional reward and the termination topic must be boolean.

Yaml Specification File

Other than potentially using the custom reward and termination conditions a user does not need to perform any other RL-Glue coding. Instead the user defines the robotics environment through a yaml configuration file. This yaml is similar to the RL-Glue task spec in that it defines the problem. In the yaml configuration file the user defines not only the RL task but also which portions of the ROS environment will provide the observations and action interfaces. rosglue will use the configuration file to automatically handle all of the messages and translate from the RL-Glue environment to ROS for the user. The RL researcher is no longer required to program in ROS and the robot researcher can use RL-Glue agents available in the RL-Library without understanding the RL-Glue interface.


Launch files (scripts to launch the appropriate ROS nodes) and sample yaml files can also provide a means of allowing users with little previous experience in ROS to immediately begin working with the robot.


Example Yaml File

In order to make this more concrete we show an example yaml file for a task in which the learning agent uses an iRobot Create for a navigation task. In this task the robot is set within a maze and needs to find its way to a goal position.

The observations provided by ros are topics that are published on the /position topic. This topic has three fields that the agent will use during learning: x, y, and theta (orientation). The actions are then provided using a service named /act. The /act service takes only one argument and this argument can have three values 0 (right), 1 (forward), 2 (left). The reward and termination are defined by the user in a python file specified using rosparam.


This example yaml file

 problemtype: episodic
 discountfactor: 1
 observations: 
   /position:
     x:
      - -0.05
      - 3.0
     y:
      - -0.05
      - 3.0
     theta: 
      - -1.6
      - 3.5
 actions: 
     /act:
        - service
        - arg1:
           - 0
           - 2 
 reward:
    type: glue #ros or glue
    range: 
      - -1
      - 10
 termination:
     type: glue #ros or glue
 extra: iRobotCreate by Sarah Osentoski


This file creates the corresponding taskspec: "VERSION RL-Glue-3.0 PROBLEMTYPE episodic DISCOUNTFACTOR 1 OBSERVATIONS DOUBLES (2 -.05 3.0) (-1.6 3.5) ACTIONS INTS (0 3) REWARDS (-1.0 10.0) EXTRA iRobotCreate by Sarah Osentoski."

Getting Started

If you think all of this sounds interesting this section will help you actually get started.

Installation

Running Code

  • From a command line start the RL-Glue server:
 rl_glue
  • Run the agent and experiment following appropriate RL-Glue instructions
  • Start the nodes on the robot for the task:
    • For example, in rosglue you will find a file named icreate.launch. This file starts up several nodes including, camera, ar_recog (ar tag recognition nodes), position tracking using odometry and ar tags and icreate control. This file requires ROS nodes from the brown-ros-package: http://code.google.com/p/brown-ros-pkg/. The file can be launched using the following command:
  roslaunch icreate.launch
  • Specify the reward and termination functions if necessary using rosparam
  • Run rosglue:
  rosrun rosglue rosglue.py
This page was last modified on 24 August 2010, at 18:18. This page has been accessed 2,362 times.