Internal:Bthomas weekly agenda

From Brown University Robotics

Revision as of 18:41, 26 September 2011 by Bthomas (Talk | contribs)
Jump to: navigation, search


Agenda: 2011/05/10 T

FrameNet Notes:

  • Introduction
    • What is FrameNet?
      • Valency = Number of arguments controlled by a verbal predictate.
        • Arguments include subject, direct object, indirect object, etc.
        • In English, 0-4 arguments.
      • Case Grammar = Studies sematic roles (Agent, Object, Beneficiary, Location, Instrument) required by a verb.
        • "Jones (A) gave money (O) the the school (B)."
        • Looks at link between valen of verb and its required grammatical context.
      • Frame Semantics = You can't understand the meaning of a word without the knowledge related to that word.
        • Extands case grammar by relating linguistic semantics (as in CG) with encyclopaedic knowledge.
        • eg, "sell" requires you know buyer, seller, goods, money, relation between the aforementioned, commercial transfer, etc.
      • FrameNet = Electronic resource based on frame semantics
        • 800 semantic frames
        • 10000 lexical units (pair word with meaning)
        • 120000 example sentences
          • Annotate all combinatoral possibilities of lexical units.
        • Document range of valences of each word in each of its senses.
    • Components
      • Semantic frame = "script" describing a type of situation/object/event along with its participants and props
      • Lexical unit = pairing of word with meaning (eg word with its corresponding semantic frame)
      • Frame elements = participants and props of a semantic frame
      • Dependents = don't evoke own semantic frame. Eg, most nouns.
    • Frame relations
      • Inheritance ("[child] is a [parent]")
      • Using: ("[child] presupposes [parent]", eg Speed frame presupposes Motion frame)
      • Subframe ("[child] is a subevent of [parent]", eg Arrest/Trial/Sentencing frames are subframes of Criminal_process)
      • Perspective on ("[child] provides perspective on [parent]", eg Hiring/Get_a_job frames are perspectives on Employment_start)
  • Development
    • Constraints for lexical units
      • All lexical units in a frame must have the same number of frame elements
      • All lexical units in a frame must have the same type of frame elements
      • Lexical units should entail the same sets of stages and transitions (eg, shoot and decapitate are different, because the latter entails a person dies where the former does not)
      • All lexical units will emphasize the same person's point of view
      • Interrelations between frame elements must be the same for all LUs in a frame.
      • Preconditions, expectations, and concurrencies of targets within a frame will be shared.
      • Denotation of targets in a frame should be similar (eg, blue and green are in the same frame, but blue and broken are not)
      • Frame-evoking element prespecifications given to frame elements will be similar. eg crowd, flock, swarm, etc. belong to Mass_Motion and not Self_Motion.
    • Non-Constraints for lexical units
      • Tense
      • Passivity
      • Antonyms (eg, hign and low are in the Position_on_a_scale frame.)
      • Context (deixis (come v. go), register (botch v. f- up), dialect (lorry v. truck), evaluation (genius v. moron))
      • In general, the focus is on paraphrasibility -- ie, ability to substitute one LU for another within a frame while still expressing the same frame, roles, and interctions.
  • Annotation
    • Triplet (frame element (eg Food, Heating_instrument, Final_value), grammatical function (eg Subject, Object), phrase type (eg noun phrase, verb phrase))
    • Guidelines
      • Dependents of target words only, not context-implicit ones (eg Smith not annotated in "Smith was surprised when Lowry retaliated for the attack.")
      • Whole consitutent labelled ("Peter Pan", not "Peter")
      • Each dependent annotated for frame element identity, phrase type, and grammatical function wrt target LU
    • Verbs
      • Easy case: All and only core frame elements (conceptually necessary participants of semantic frame)
      • Expletives
        • = non-referential material with no semantic relationship to the predicate
        • [It NULL] is clear that we won't finish on time.
      • Aspect
        • chattering [away]
    • Nouns
      • Support expressions
        • [taking] revenge
          • About an act of about revenge, not about an act of taking, but revenge here is a noun!
        • It's lexographically necessary to record them -- support verbs are seelcted by nouns ("we [had] an argument" vs. "he [made] an argument")
      • Copulas
        • Target noun is a support preposition and projects a finite clause.
        • "[is COP] [on SUPP] fire"
  • Identifying phrase types
    • Noun phrases
      • Non-referential, aka expletives ("There was a row", "It was raining")
      • Possessive ("Their arrival")
      • Non-maximal nominal ("[fast food] allergy")
      • Standard ("[John] said hi.")
    • Verb phrases
      • Finite ("Who do you think [ate the sandwich]?"
      • Non-finite
        • Bare stem infinitives ("We made the children [take naps].")
        • To-marked infinitives ("The cat wants [to go outside].")
        • Verb phrase relatives ("Towels [to dry yourself with] ...")
        • Participal verb phrases ("...the man [shown on the photograph] ...")
        • Gerunditive verb phrase ("...likes [running barefoot].")
  • Grammatical functions
    • Assigned by target verbs -- External argument, Object, Dependent
    • Assigned by target nouns -- External argument, Genitive determiner, Dependent, Appositive
    • External argument = outside of maximal phrase headed by target word but functionally linked to target word ("[The physician] *performed* the surgery")
    • Object = normal object ("They expect [us]")
    • Dependent = union(arguments, adjunts) ("the fact [that cats have fur]")
    • Genitive determiner = "[your] book", "[your work's] influence"
    • Appositive = "lawyer [Jonathan Crystal]", "girlfriend [Carolyn Homer]"

Agenda: 2011/05/03 T

  • Created a list of verbs that I'd like the robot to do. (These will then be the verbs in my Action Hierarchy Model.)

move, go, grab, fetch, retrieve, find, pick up, put down, place, set, assemble, combine, disassemble, break, follow, track, watch, open (hinged), close (hinged), open (drawer), close (drawer), open (jar), close (jar), turn on, turn off, stir, chop, mix, shake, pour, fold, pose (for a picture), shake (hands), clean

  • Talked with Pete about combining our work. Essentially, he'll be doing all the front-end stuff and the dialog-> verb part, and I'll focus on the verb->instantiation part.
  • Next week: Will read FrameNet book and be ready to explain it. (In particular, I'll look at how they deal with verb slots.)

Agenda: 2011/04/26 T

  • Began working on implementing proposed research. In particular, I started implementing the Action Hierarchy Model from verbs down to instantiations.

Agenda: 2011/04/19 T

  • Proposal done!

Agenda: 2011/04/12 T

  • Worked on proposal.

Agenda: 2011/04/05 T

  • Research proposal date: 2011/04/18 M 10am-11am
    • => The following deadlines:
      • 2011/04/10 U: Send title, abstract, and URL to your proposal document to Chad. Chad needs to approve this before Lauren can send out the information to the faculty.
      • 2011/04/11 M: Send Lauren Clarke ( your *advisor-approved* title, abstract, and URL to your proposal document
  • NSF gave me an Honorable Mention for the GRFP.
  • Working on slides and document.

Agenda: 2011/03/29 T

  • R+R Sara Kiesler - Fostering Common Ground in Human-Robot Interaction (IEEE Workshop on Robots and Human Interactive Communication 2005)

This paper claims that robots interacting with humans need to be able to establish common ground. The common ground postulate states that people in conversation minimize their collective effort necessary to gain understanding; that is, they communicate with sufficient detail that each understands, but no more. Although humans are innately able to do this by modeling their peers and estimating their knowledge and thus the pair's common ground, robots lack this ability and often frustrate communicatively by saying too much or too little. Further, humans lack mental models of robots and thus often impair the robot's ability to understand by saying too much or too little. Human mental models can be aided by bootstrapping our models of other humans and applying these characteristics to robots, for instance by having robots don job-appropriate attire. The effect of robot appearance on human perception of common ground is demonstrated experimentally with a robot "dating counselor" HRI experiment, where the robot is given either a male or a female appearance and interacts with both male and female participants to "build a database" of dating knowledge, and length of communication is noted.

This paper reveals an interesting point that having only one mode of interaction between humans and robots is insufficient -- different humans have different needs due to different background knowledges, and the same applies for robots. However, while this statement is implied several times in the paper, supported by various previous experiments, it is never explicitly stated and is instead hidden under the guise of "different common grounds". Further, although the paper illustrates that these different common grounds exist, it does not attempt to quantify them (except in one previous experiment) or show how numerically significant these differences are. This is particularly noticeable in the paper's own experiment. Claims of observing different common grounds with the dating counselor robot lacked numerical evidence to support them. It would be interesting to see how significant these differences are, and simple metrics (for instance, number of words) were stated which should be easily measurable. Finally, most of the paper summarized others' work, and little space was used to describe this paper's new contribution; it would have been nice to see a more formal presentation of the experiment.

  • Committee done.
    • Chad
    • Brian
    • Candy
    • James
    • I never heard back from Michael Beetz, so I'm assuming a no from him.
  • Tentative research proposal date: 2011/04/14 R 2pm-3pm
    • => The following deadlines:
      • 2011/04/06 Wed.: Send title, abstract, and URL to your proposal document to Chad. Chad needs to approve this before Lauren can send out the information to the faculty.
      • 2011/04/07 Thu.: Send Lauren Clarke ( your *advisor-approved* title, abstract, and URL to your proposal document
  • Working on research. I have a very hacked-together system which does basic movement on the iCreate, with a text file that allows you to enumerate possible dialogs (as regexs) and their corresponding actions. Things to do coming up:
    • I am using actionlibs to contain each action. (ie, Each action is a server.) I believe this+smach is what Willow uses for their actual demos. Converting from nodes to actionlibs is relatively straightforward (though not enough to do so automatically).
    • I'm only doing primitive tasks, not breaking up sentences. This is next on my list. Particularly, the challenges are sequential tasks ("___. ___.", "___, then ___", etc.) and concurrent tasks ("___ while ___", "___ until ___", etc.). I want to spend enough time on this that the results are convincing, but it's also not the core problem we're trying to solve.
    • Basic web frontend. From an academic perspective, it's trivial. (I'm currently just passing in messages through text.)
    • Adding more operations. It's especially important for our claims that this is straightforward.

Agenda: 2011/03/15 T


  • Slides for robot dialog
  • Round 2 of abstract
    • I need to write to prospective committee members today (Candy Sidner, Eugene Charniak, Michael Beetz) and attach this / it's future revision.

As robots become increasingly prevalent and increasingly complicated, a gap exists between the robot's capabilities and people's ability to control the robot to accomplish their goals. One potential solution to this problem is through language-based communication. We examine two problems related to this effort. First, we investigate the use of dialog -- a restricted but expressive subset of natural language -- to empower end users with a greater set of robotic capability. Second, because the robot uses this dialog to interact with the real world, we explore the grounding of spoken nouns and verbs into objects and actions, respectively. While previous work exists in both areas, the emergence of ROS and its community's codebase provides a base upon which a more expansive and task- upon which we provide a framework that implements our dialog system, grounds a varied set of household actions and objects, and demonstrates several real-world use cases.

  • R+R Grollman (Brown) Incremental Learning of Subtasks from Unsegmented Demonstration (IROS 2010)

This paper presents a model (ROGER) and method for automatically and simultaneously segmenting unlabelled training data into subtasks and learning these subtasks using an infinite mixure of Gaussian process experts. The process uses SOGP both to incrementally learn a latent control policy (theoretically allowing for online learning) and to allow real-time data processing (afforded by the model's sparsity). Partitioning between subtasks is achieved using a Chinese restaurant process. Inference is achieved incrementally for each new particle by assigning it to each expert, determining the resultant likelihoods, and using optimal thresholding to carry forward some [one?] of these assignments. Prediction is acheived by picking a particle, choosing an expert for that particle (using the transition matrix), and generating an output using that expert's SOGP regressor. The approach is experimentally validated by learning from a hand-coded controller and comparing the learned controller's performance against the original in the task of goal scoring for robot soccer.

The paper presented an interesting idea for extending learning from demonstration into multimap scenarios. Comparing the performace of the implemented system against both the hand-coded controller and one of the previous best-performing implementations both established the performance gained by the developed process and the gap between optimal and current performance. However, the current implementation also required a transition map to be provided; it would be intriguing to see this automatically generated in the future. Further, although the claim was made that the algorithm was fairly robust to their selection, the current implementation requires significant selection of hyperparameters. [I don't know if this actually can be changed significantly.] The section on segmentation analysis, while offering interesting insights, could be shortened. Addressing the problems cited with proposed fixes or research directions would add intrigue to the section. Finally, the paper mentions that many parameters were chosen based on computational limits. Would it be possible to analyze the effect computational power has on the algorithm?

[Note: Although I understood this paper at a high level, the machine learning behind it is still opaque to me. In particular, I know little more than the name for: Gaussian processes, SOGP, Inverse-Wishart distributions, POMDP, DPA.]

  • R+R Beetz (TUM) KNOWROB-MAP -- Knowledge-Linked Semantic Object Maps
    • This could tie in nicely to the robot dialog project.

This paper presents a method for mapping symbolic names of objects to facts about that object in a knowledgebase and implements it as KNOWROB-MAP. KNOWROB-MAP leverages KNOWROB to provide symbolic object names in an environment. OMICS (the Open Mind Indoor Common Sense project, a database of commonsense knowledge for indoor mobile robots) is used in conjunction with Cyc (which categorizes and provides dictionary descriptions of objects) via WordNet, which maps the natural language descriptions in OMICS to word meanings. (A map between these meanings and Cyc already exists.) By combining these databases, formal ontological concepts of words are formed. This knowledge is represented in the Web Ontology Language (OWL), which allows distinguishability between instances and classes and additionally provides connections between instances/classes via roles. The concept is further expanded into probabilistic environmental models using Bayesian Logic Networks. [I don't know about these yet and thus don't quite understand the reasoning behind this section.] Finally, a ROS service is provided to enable language-independent queries of KNOWROB-MAP. The efficacy of the system was tested by the instruction "clean the table".

Demonstrating the power of connecting multiple large-scale databases is an intriguing concept, as was the fact that this connection was done automatically. However, the performance of KNOWROB-MAP is evaluated with only a single query. This, in many ways, fails to demonstrate the power of the system. It would be interesting to see how KNOWROB-MAP performs with other queries; in particular, what would happen if typical people tried to instruct the robot to do something? Further, seeing a robot actually perform this task, instead of detailing the outcome of a query, could add credence to the merits of KNOWROB-MAP. Using standard languages such as OWL [is this actually standard?] and connecting KNOWROB-MAP to ROS will enable others to use this software with minimal effort. No mention of computational time and scalability was given; is it always trivial? (One concern: ROS service calls are blocking.) Finally, it would be nice if the section on probabilistic environmental models were elaborated on more thoroughly; the implementation descriptions throughout the previous sections of the paper could be shortened to accommodate this.

Working on:

  • Demos for recruiting weekend: AR.Drone wiimote + Nolan3D
  • Robot dialog
    • Setting up github
    • Using actionlibs to allow for preemptable, nonblockable routines. (eg This will work really well for the "until" statement.)

Agenda: 2011/03/01 T

  • Kuipers Walk the talk: Connecting language, knowledge, and action in route instructions (AAAI 2006)

This paper presents a method for following natural-language route instructions using four actions and pre- and post-conditions using a system called "Marco". Natural language instructions ware modelled through parsing, extracting each sentence's surface meaning, and modelling inter-sentence spatial and linguistic knowledge. Given this model and the perception of the environment, an executor determines which of the four actions (namely, Turn, Travel, Verify, and Declare-goal) to take, a process dubbed "compound action specification". Implicit actions including the actions Travel and Turn are inferred when necessary. (For instance, the instruction "Go to the chair." may first require Marco to turn to find the chair.) To evaluate Marco's performance, approximately 700 instructions were created by 6 paricipants over 3 virtual worlds. Another set of 36 participants followed these instructions. Each instruction was followed 6 times, and both success at reaching the desired goal point and the participant's subjective rating of the instruction were recorded. Each instruction was parsed and hand-verified for Marco, and Marco attempted to follow each parsed instruction set. A statistically significant difference was found between Marco's abilities with implicit actions and without them.

The paper suggests that the four actions present are sufficient for many route-following tasks. However, many real-world obstacles such as doors, stairs, and multiple floors are typically present. It would be interesting to see how this could scale to a real-world domain, and it would be especially enlightening to see this task performed in the real world by a robot. When evaluating Marco's performance, the evaluation of Marco's performance differed from that of the humans. (Humans started in a random direction, and Marco made four attempts from four different directions and averaged (How?) the results.) It would be helpful if a rationale for this decision was included or if the data would be reevaluated with Marco starting out in a random orientation to match the human trials. Finally, the paper's abstract claims that Marco "follows free-form, natural language route instructions". However, only hand-parsed trees were evaluated, and the parsing methodology was only briefly discussed. Please provide a rationale for why the parser was not used in the evaluation. Further, more detail on the parsing involved -- especially on how pre-conditions and post-conditions were formed -- would be appreciated. The section comparing this paper to Instruction-Based Learning could be cut to allow for the space necessary to describe this.

  • ... of particular interest to me (next paper to read?):
    • Complicated executors (as opposed to one-at-a-time executors)
      • Full action sequencers
        • RAPs (Bonnasso etal 1997)
        • TDL (Simmons etal 2003)
      • Reasoning on an inferred route topology (Kuipers etal 2004)
  • TODO: I need a 3-person committee (advisor + 2) by 2011/03/15 T. Research proposal must be performed by 2011/04/21 R. Any thoughts?
  • Finished ROS smach tutorials
    • Pros of system?
      • Abstracts lots of FSM details
        • Regular FSMs
        • Some amount of concurrency
        • Has data passing
      • Visualizer is really nice
      • Generic state types are nice
    • Cons of system?
      • Data passing is annoying
        • There's no known way to connect two datas that are named differently in the global space, so mass-renaming is the only solution
      • Smach creates "just a big chunk of code": no ROS node starting/killing
        • That means all the functionalities you might ever need have to be started on boot
        • Can we code something to manage this process, even if it's Linux-/Ubuntu-specific?
      • Concurrence requires are children to terminate. Can we hack around this? (Do we need to?)
      • You have to make everything you want to code a state.
        • Somehow, you'll also have to specify what they need node-wise to run.
        • smach obviously wasn't designed for the task-at-hand in this regard
    • Things smach probably can solve
      • Deterministic FSM, where each node is independent
    • Things smach may not be able to solve
      • Perception requiring motor control. In other words, combining multiple overlapping motor requests intelligently. [Note: I think this is a very interesting general question.]
      • Created state machines may be _just barely_ human-readable

Agenda: 2011/02/22 T

  • Chernova and Breazeal: Crowdsourcing HRI Through Online Multiplayer Games (AAAI 2010)
  • Roy: Toward Understanding Natural Language Directions (HRI 2010)
  • Cleaned up the lab; it's pretty now.
  • Presented AR.Drone demo to PhD recruiting heads. They want to show off the lab to everyone. Demos is on 2011/03/18F at 2pm-3pm and will include:
    • AR.Drone + wiimote
    • nolan3d
    • Chad talking about the lab and what we do?

Agenda: 2011/02/15 T

  • Got AR.Drones ready for Chad's presentation, and documented process here.

Agenda: 2011/02/08 T

  • Done
    • Ordered ardrone replacement parts; arrival in ~1-2 weeks?
  • In progress (highest priority first)
    • This semester's courses: machine learning (Erik), robots for education (Chad), reviewing linear algebra (self)
    • Get ardrones ready for Chad's presentation this weekend
    • Starting to work on robot dialog project with Pete White

Previous semesters

Spring 2011 Fall 2010