AIM: Intent-Aware Unified world action Modeling with Spatial Value Maps
arXiv:2604.11135v1 Announce Type: new
Abstract: Pretrained video generation models provide strong priors for robot control, but existing unified world action models still struggle to decode reliable actions without substantial robot-specific training….