When you say to a human, "Please make me some cheesy toast", there is a whole range of actions implicit in that action that are immediately understood: retrieving bread and cheese from where they're both kept, slicing both where required, turning on the grill, toasting the bread, and so forth.
With a robot, it's a little more complicated. You have to teach it that making cheesy toast includes all these actions and then account for variables, such as sliced bread and unsliced bread; and then account for variations in phrasing. That's what researchers Professor Ashutosh Saxena, Dipendra Misra and Jaeyong Sung, both doctoral students at Cornell University, are trying to achieve (PDF) in the Cornell Robot Learning Lab.
The idea is to teach the robots to understand basic commands in colloquial English from different speakers, filling in missing information and adapting to their environment.
For example, the instruction, "Place the pot on the tap and turn the tap on. When it is filled, turn the tap off and heat the pot" includes several missing steps that the robot will have to account for: knowing where the pot and tap are located, moving to those locations, knowing that "heat the pot" implies using the stove, and knowing where the stove is.
The robot is equipped with a 3D camera for viewing its environment, running software programmed in the lab for recognising the objects in its environment, linked up with understanding how to use those objects. This means that it knows what a pot is for, even if you move the pot, and how to use a stove.
It has a set of templates for common actions, and has been trained to associate a string of actions with flexible commands; for example, "Take the pot to the stove", "Carry the pot to the stove", "Put the pot on the stove" and "Go to the stove and heat the pot" all mean one thing to the robot.
Of course, it's not entirely perfect yet -- trained to make ramen noodles and make affogato, the robot gets the all the actions correct 64 percent of the time, even when the commands were varied or the environment had changed -- but that's a pretty big step forward, three to four times more accurate than previous methods, the researchers said.
But it's only going to get better: the team is currently seeking help from the public, who can sign up on the Tell Me Dave website. A video game-like interface is used to teach a simulated robot a simple kitchen task; this input then becomes part of a large crowdsourced library of instructions for the robots at Cornell.
"With crowdsourcing at such a scale, robots will learn at a much faster rate," Professor Saxena said.