Elastic Motion Policy: An Adaptive Dynamical System for Robust and Efficient One-Shot Imitation Learning

GRASP Lab, University of Pennsylvania

Abstract

Behavior cloning (BC) has become a staple imitation learning paradigm in robotics due to its ease of teaching robots complex skills directly from expert demonstrations. However, BC suffers from an inherent generalization issue. To solve this, the status quo solution is to gather more data. Yet, regardless of how much training data is available, out-of-distribution performance is still sub-par, lacks any formal guarantee of convergence and success, and is incapable of allowing and recovering from physical interactions with humans. These are critical flaws when robots are deployed in ever-changing human-centric environments. Thus, we propose Elastic Motion Policy (EMP), a one-shot imitation learning framework that allows robots to adjust their behavior based on the scene change while respecting the task specification. Trained from a single demonstration, EMP follows the dynamical systems paradigm where motion planning and control are governed by first-order differential equations with convergence guarantees. We leverage Laplacian editing in full end-effector space, $\mathbb{R}^3\times SO(3)$, and online convex learning of Lyapunov functions, to adapt EMP online to new contexts, avoiding the need to collect new demonstrations. We extensively validate our framework in real robot experiments, demonstrating its robust and efficient performance in dynamic environments, with obstacle avoidance and multi-step task capabilities.



Experiment: Book Placement

Single Demonstration

A single demonstration of placing the book into a bookrack

Deploy and Adapt

Here is a non-stop video showing the learned policy can adapt. Note that in this video, the robot will start to execute when the object movement settles down. We also show the learned policy can handle perturbations.

Visualizing Real-Time Rollout

We would like to show how the policy adapts as things change. Here is an example of real-time policy rollout (in blue) visualization. As the bookrack pose changes (the small moving frame) and the robot end-effector changes, the policy adapts to it in real time.



Experiment: Pouring Cubes

Single Demonstration

A single demonstration of pouring cubes into a saucepan

Long Video

Here is a 2 mins non-stop video showing the learned policy can adapt

More adapting

Here is an example of the policy just keep updating online



Experiment: Multi-Step Pick and Place

Single Demonstration

A single demonstration of multiple pick and place

Adapt to Interruption

During execution, a human interruption moves the container

Adapt to New Configuration

Adapt to a different configuration from the demonstration

Adapt to Interruption

During execution, a human interruption moves the object



Obstacle Avoidance

Single Demonstration

A single demonstration of placing the book into a bookrack

Applying Modulation

We use modulation to perform obstacle avoidance. The obstacle (the yellow mustard bottle) is modeled as a sphere with RGBD camera tracking its position. To prevent losing the tracking, we must move the obstacle slowly. On the right is a visualization of the rollout from the learned policy, showing that it attempts to circumvent the obstacle from above.



Extra Case Study: Articulated Objects

(Even though our method is not designed for articulated objects, we would like to make the attempt to show the possibility)


Single Demonstration

A single demonstration of placing the mustard bottle inside a closet and close the closet

Deploy and Adapt

The object and closet are shifted