VSAS-Bench: Real-Time Evaluation of Visual Streaming Assistant Models
arXiv:2604.07634v2 Announce Type: replace
Abstract: Streaming vision-language models (VLMs) continuously generate responses given an instruction prompt and an online stream of input frames. This is a core mechanism for real-time visual assistants. Exi…