Internal:Bthomas weekly agenda
From Brown University Robotics
Revision as of 00:45, 15 March 2011 by Bthomas
Agenda: 2011/03/15 T
As robots become increasingly prevalent and increasingly complicated, a gap exists between the robot's capabilities and people's ability to control the robot to accomplish their goals. One potential solution to this problem is through language-based communication. We examine two problems related to this effort. First, we investigate the use of dialog -- a restricted but expressive subset of natural language -- to empower end users with a greater set of robotic capability. Second, because the robot uses this dialog to interact with the real world, we explore the grounding of spoken nouns and verbs into objects and actions, respectively. While previous work exists in both areas, the emergence of ROS and its community's codebase provides a base upon which a more expansive and task- upon which we provide a framework that implements our dialog system, grounds a varied set of household actions and objects, and demonstrates several real-world use cases.
This paper presents a model (ROGER) and method for automatically and simultaneously segmenting unlabelled training data into subtasks and learning these subtasks using an infinite mixure of Gaussian process experts. The process uses SOGP both to incrementally learn a latent control policy (theoretically allowing for online learning) and to allow real-time data processing (afforded by the model's sparsity). Partitioning between subtasks is achieved using a Chinese restaurant process. Inference is achieved incrementally for each new particle by assigning it to each expert, determining the resultant likelihoods, and using optimal thresholding to carry forward some [one?] of these assignments. Prediction is acheived by picking a particle, choosing an expert for that particle (using the transition matrix), and generating an output using that expert's SOGP regressor. The approach is experimentally validated by learning from a hand-coded controller and comparing the learned controller's performance against the original in the task of goal scoring for robot soccer.
The paper presented an interesting idea for extending learning from demonstration into multimap scenarios. Comparing the performace of the implemented system against both the hand-coded controller and one of the previous best-performing implementations both established the performance gained by the developed process and the gap between optimal and current performance. However, the current implementation also required a transition map to be provided; it would be intriguing to see this automatically generated in the future. Further, although the claim was made that the algorithm was fairly robust to their selection, the current implementation requires significant selection of hyperparameters. [I don't know if this actually can be changed significantly.] The section on segmentation analysis, while offering interesting insights, could be shortened. Addressing the problems cited with proposed fixes or research directions would add intrigue to the section. Finally, the paper mentions that many parameters were chosen based on computational limits. Would it be possible to analyze the effect computational power has on the algorithm?
[Note: Although I understood this paper at a high level, the machine learning behind it is still opaque to me. In particular, I know little more than the name for: Gaussian processes, SOGP, Inverse-Wishart distributions, POMDP, DPA.]
This paper presents a method for mapping symbolic names of objects to facts about that object in a knowledgebase and implements it as KNOWROB-MAP. KNOWROB-MAP leverages KNOWROB to provide symbolic object names in an environment. OMICS (the Open Mind Indoor Common Sense project, a database of commonsense knowledge for indoor mobile robots) is used in conjunction with Cyc (which categorizes and provides dictionary descriptions of objects) via WordNet, which maps the natural language descriptions in OMICS to word meanings. (A map between these meanings and Cyc already exists.) By combining these databases, formal ontological concepts of words are formed. This knowledge is represented in the Web Ontology Language (OWL), which allows distinguishability between instances and classes and additionally provides connections between instances/classes via roles. The concept is further expanded into probabilistic environmental models using Bayesian Logic Networks. [I don't know about these yet and thus don't quite understand the reasoning behind this section.] Finally, a ROS service is provided to enable language-independent queries of KNOWROB-MAP. The efficacy of the system was tested by the instruction "clean the table".
Demonstrating the power of connecting multiple large-scale databases is an intriguing concept, as was the fact that this connection was done automatically. However, the performance of KNOWROB-MAP is evaluated with only a single query. This, in many ways, fails to demonstrate the power of the system. It would be interesting to see how KNOWROB-MAP performs with other queries; in particular, what would happen if typical people tried to instruct the robot to do something? Further, seeing a robot actually perform this task, instead of detailing the outcome of a query, could add credence to the merits of KNOWROB-MAP. Using standard languages such as OWL [is this actually standard?] and connecting KNOWROB-MAP to ROS will enable others to use this software with minimal effort. No mention of computational time and scalability was given; is it always trivial? (One concern: ROS service calls are blocking.) Finally, it would be nice if the section on probabilistic environmental models were elaborated on more thoroughly; the implementation descriptions throughout the previous sections of the paper could be shortened to accommodate this.
Agenda: 2011/03/01 T
This paper presents a method for following natural-language route instructions using four actions and pre- and post-conditions using a system called "Marco". Natural language instructions ware modelled through parsing, extracting each sentence's surface meaning, and modelling inter-sentence spatial and linguistic knowledge. Given this model and the perception of the environment, an executor determines which of the four actions (namely, Turn, Travel, Verify, and Declare-goal) to take, a process dubbed "compound action specification". Implicit actions including the actions Travel and Turn are inferred when necessary. (For instance, the instruction "Go to the chair." may first require Marco to turn to find the chair.) To evaluate Marco's performance, approximately 700 instructions were created by 6 paricipants over 3 virtual worlds. Another set of 36 participants followed these instructions. Each instruction was followed 6 times, and both success at reaching the desired goal point and the participant's subjective rating of the instruction were recorded. Each instruction was parsed and hand-verified for Marco, and Marco attempted to follow each parsed instruction set. A statistically significant difference was found between Marco's abilities with implicit actions and without them.
The paper suggests that the four actions present are sufficient for many route-following tasks. However, many real-world obstacles such as doors, stairs, and multiple floors are typically present. It would be interesting to see how this could scale to a real-world domain, and it would be especially enlightening to see this task performed in the real world by a robot. When evaluating Marco's performance, the evaluation of Marco's performance differed from that of the humans. (Humans started in a random direction, and Marco made four attempts from four different directions and averaged (How?) the results.) It would be helpful if a rationale for this decision was included or if the data would be reevaluated with Marco starting out in a random orientation to match the human trials. Finally, the paper's abstract claims that Marco "follows free-form, natural language route instructions". However, only hand-parsed trees were evaluated, and the parsing methodology was only briefly discussed. Please provide a rationale for why the parser was not used in the evaluation. Further, more detail on the parsing involved -- especially on how pre-conditions and post-conditions were formed -- would be appreciated. The section comparing this paper to Instruction-Based Learning could be cut to allow for the space necessary to describe this.
Agenda: 2011/02/22 T
Agenda: 2011/02/15 T
Agenda: 2011/02/08 T