Learning Dynamic Bipedal Walking Across Stepping Stones
Helei Duan, Ashish Malik, Mohitvishnu S. Gadde, Jeremy Dao, Alan Fern, Jonathan Hurst 2022 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)
Abstract
In this work, we propose a learning approach for 3D dynamic bipedal walking when footsteps are constrained to stepping stones. While recent work has shown progress on this problem, real-world demonstrations have been limited to relatively simple open-loop, perception-free scenarios. Our main contribution is a more advanced learning approach that enables real-world demonstrations, using the Cassie robot, of closedloop dynamic walking over moderately difficult stepping-stone patterns. Our approach first uses reinforcement learning (RL) in simulation to train a controller that maps footstep commands onto joint actions without any reference motion information. We then learn a model of that controller’s capabilities, which enables prediction of feasible footsteps given the robot’s current dynamic state. The resulting controller and model are then integrated with a real-time overhead camera system for detecting stepping stone locations. For evaluation, we develop a benchmark set of stepping stone patterns, which are used to test performance in both simulation and the real world. Overall, we demonstrate that sim-to-real learning is extremely promising for enabling dynamic locomotion over stepping stones. We also identify challenges remaining that motivate important future research directions.
System Architecture
The policy network combines proprioceptive states processed by an LSTM dynamics module with footstep commands and a periodic clock signal. A feed-forward (FF) layer integrates these inputs to produce PD targets for motor control. This design allows task-specific inputs, such as commands, to adapt without altering the pretrained LSTM module, enabling flexible and reusable policy learning.
Reachability Prediction Model
The Reachability Prediction Model in the paper is a learned model designed to predict reachable footstep locations for the robot based on its current dynamic state. This model encapsulates the robot’s dynamics in a latent state and predicts two key outcomes: the step error for the current footstep target and the latent robot state after the next touchdown. By leveraging these predictions, the model identifies regions where the robot can reliably step within a specified error threshold. Additionally, the model supports multi-step lookahead by recursively predicting future latent states, which is especially useful for navigating highly constrained terrains. It is trained using supervised learning on data that captures relationships between the robot’s state, footstep targets, and the resulting errors, enabling robust performance across diverse scenarios.
Experiments
Results
(a) The training curve shows the benefit of using a pretrained dynamics layer, which results in higher reward and faster convergence. (b) The performance of the policy is evaluated in during training (∼50millions) shows that the pretrained method performs better than trained from scratch on the set of patterns. (c) The policy can only see the immediate next footstep command. As part of the emergent behaviors, the learned step frequency allows the robot to successfully achieve the next target step by taking a longer or shorter swing duration. The policy also learns to elevate the body height in order to enable longer steps.