Self-improvement requires robotic systems to initially learn from human-provided data and then gradually enhance their capabilities through interaction with the environment. This is similar to how humans improve their skills through continuous practice. However, achieving effective self-improvement is challenging, primarily because robots tend to repeat their existing abilities during interactions, often failing to generate new, valuable data for learning. In this paper, we identify the key to successful self-improvement: modal-level exploration and data selection. By incorporating a modal-level exploration mechanism during policy execution, the robot can produce more diverse and multi-modal interactions. At the same time, we select the most valuable trials and high-quality segments from these interactions for learning. We successfully demonstrate effective robot self-improvement on both simulation benchmarks and real-world experiments. The capability for self-improvement will enable us to develop more robust and high-success-rate robotic control strategies at a lower cost.
Starting from a robot policy learned from human-provided demonstrations,
collect and select the most valuable trajectories and
segments, and refine the policy.
Previous works and our experiments demonstrate that naively fine-tuning policies using self-collected interaction data is often ineffective. This is because imitation learning policies often produce repetitive or deterministic behaviors, resulting in limited diversity in the collected data.
Our key finding in this paper is that introducing modal-level exploration during interaction can significantly increase the diversity of the self-collected data and improve self-improvement effectiveness.
For example, in the real-world cup stacking task,
without the introduced
While with the
We are also surprised to find that after the introduction of
By training the policy with self-collected data and performing only one round of
As the number of iterations increases, the advantage of
We conduct a thorough ablation study to validate the effectiveness of several key components in
SIME. Our studies also highlight the critical role of
@misc{jin2025sime,
title={SIME: Enhancing Policy Self-Improvement with Modal-level Exploration},
author={Yang Jin and Jun Lv and Wenye Yu and Hongjie Fang and Yong-Lu Li and Cewu Lu},
year={2025},
eprint={2505.01396},
archivePrefix={arXiv},
primaryClass={cs.RO},
url={https://arxiv.org/abs/2505.01396},
}