Learning World Models for Interactive Video Generation
arXiv:2505.21996v3 Announce Type: replace-cross
Abstract: Foundational world models must be both interactive and preserve spatiotemporal coherence for effective future planning with action choices. However, present models for long video generation hav…