Intelligent agents progress by continually refining their capabilities through actively exploring environments. Yet robot policies often lack sufficient exploration capability due to action mode collapse. Existing methods that encourage exploration typically rely on random perturbations, which are unsafe and induce unstable, erratic behaviors, thereby limiting their effectiveness. We propose Self-Improvement via On-Manifold Exploration (SOE), a framework that enhances policy exploration and improvement in robotic manipulation. SOE learns a compact latent representation of task-relevant factors and constrains exploration to the manifold of valid actions, ensuring safety, diversity, and effectiveness. It can be seamlessly integrated with arbitrary policy models as a plug-in module, augmenting exploration without degrading the base policy performance. Moreover, the structured latent space enables human-guided exploration, further improving efficiency and controllability. Extensive experiments in both simulation and real-world tasks demonstrate that SOE consistently outperforms prior methods, achieving higher task success rates, smoother and safer exploration, and superior sample efficiency. These results establish on-manifold exploration as a principled approach to sample-efficient policy self-improvement.
We perform exploration in a compact latent space, which is optimized to preserve only
task-essential information in observation while discarding irrelevant details,
ensuring exploration remains constrained to the task-relevant manifold.
Implemented as a plug-in module on top of existing imitation learning policies,
diverse action generation without
compromising the base policy's performance.
We evaluate our method
Compared to prior exploration methods, our approach consistently achieves higher success rates,
smoother motions, and reduced rollout requirements, highlighting
Our experiment also reveals that the learned latent space naturally disentangles task-relevant
factors into several distinct dimensions, each corresponding to a specific pattern of variation. This
property enables
@misc{jin2025soe,
title={SOE: Sample-Efficient Robot Policy Self-Improvement via On-Manifold Exploration},
author={Yang Jin and Jun Lv and Han Xue and Wendi Chen and Chuan Wen and Cewu Lu},
year={2025},
eprint={2509.19292},
archivePrefix={arXiv},
primaryClass={cs.RO},
url={https://arxiv.org/abs/2509.19292},
}