cs.CV

EchoTorrent: Towards Swift, Sustained, and Streaming Multi-Modal Video Generation

arXiv:2602.13669v5 Announce Type: replace
Abstract: Recent multi-modal video generation models have achieved high visual quality, but their prohibitive latency and limited temporal stability hinder real-time deployment. Streaming inference exacerbates…