Behavior cloning (BC) has become a staple imitation learning paradigm in robotics due to its ease of teaching robots complex skills directly from expert demonstrations. However, BC suffers from an inherent generalization issue. To solve this, the status quo solution is to gather more data. Yet, regardless of how much training data is available, out-of-distribution performance is still sub-par, lacks any formal guarantee of convergence and success, and is incapable of allowing and recovering from physical interactions with humans. These are critical flaws when robots are deployed in ever-changing human-centric environments. Thus, we propose Elastic Motion Policy (EMP), a one-shot imitation learning framework that allows robots to adjust their behavior based on the scene change while respecting the task specification. Trained from a single demonstration, EMP follows the dynamical systems paradigm where motion planning and control are governed by first-order differential equations with convergence guarantees. We leverage Laplacian editing in full end-effector space, $\mathbb{R}^3\times SO(3)$, and online convex learning of Lyapunov functions, to adapt EMP online to new contexts, avoiding the need to collect new demonstrations. We extensively validate our framework in real robot experiments, demonstrating its robust and efficient performance in dynamic environments, with obstacle avoidance and multi-step task capabilities.
A single demonstration of placing the book into a bookrack
Here is a non-stop video showing the learned policy can adapt. Note that in this video, the robot will start to execute when the object movement settles down. We also show the learned policy can handle perturbations.
We would like to show how the policy adapts as things change. Here is an example of real-time policy rollout (in blue) visualization. As the bookrack pose changes (the small moving frame) and the robot end-effector changes, the policy adapts to it in real time.
A single demonstration of pouring cubes into a saucepan
Here is a 2 mins non-stop video showing the learned policy can adapt
Here is an example of the policy just keep updating online
A single demonstration of multiple pick and place
During execution, a human interruption moves the container
Adapt to a different configuration from the demonstration
During execution, a human interruption moves the object
A single demonstration of placing the book into a bookrack
We use modulation to perform obstacle avoidance. The obstacle (the yellow mustard bottle) is modeled as a sphere with RGBD camera tracking its position. To prevent losing the tracking, we must move the obstacle slowly. On the right is a visualization of the rollout from the learned policy, showing that it attempts to circumvent the obstacle from above.
(Even though our method is not designed for articulated objects, we would like to make the attempt to show the possibility)
A single demonstration of placing the mustard bottle inside a closet and close the closet
The object and closet are shifted