OmniVLA-RL: A Vision-Language-Action Model with Spatial Understanding and Online RL
arXiv:2604.17706v1 Announce Type: new
Abstract: Visual-Language-Action (VLA) models represent a paradigm shift in embodied AI, yet existing frameworks often struggle with imprecise spatial perception, suboptimal multimodal fusion, and instability in r…