Spatiotemporal Sycophancy: Negation-Based Gaslighting in Video Large Language Models
arXiv:2604.17873v1 Announce Type: new
Abstract: Video Large Language Models (Vid-LLMs) have demonstrated remarkable performance in video understanding tasks, yet their robustness under conversational interaction remains largely underexplored. In this …