CubeDAgger: Interactive Imitation Learning for Dynamic Systems with Efficient yet Low-risk Interaction
arXiv:2505.04897v2 Announce Type: replace-cross
Abstract: Interactive imitation learning makes an agent's control policy robust by stepwise supervisions from an expert. The recent algorithms mostly employ expert-agent switching systems to reduce the expert's burden by limitedly selecting the supervision timing. However, this approach is useful only for static tasks; in dynamic tasks, timing discrepancies cause abrupt changes in actions, losing the robot's dynamic stability. This paper therefore proposes a novel method, named CubeDAgger, which improves robustness with less dynamic stability violations even for dynamic tasks. The proposed method is designed on a baseline, EnsembleDAgger, with three improvements. The first adds a regularization to explicitly activate the threshold for deciding the supervision timing. The second transforms the expert-agent switching system to an optimal consensus system of multiple action candidates. Third, autoregressive colored noise is injected to the agent's actions for time-consistent exploration. These improvements are verified by simulations, showing that the trained policies are sufficiently robust while maintaining dynamic stability during interaction. Finally, real-robot scooping experiments with a human expert demonstrate that the proposed method can learn robust policies from scratch based on just 30 minutes of interaction. https://youtu.be/kBl3SCTnVEM