cs.CV

TennisTV: Do Multimodal Large Language Models Understand Tennis Rallies?

arXiv:2509.15602v5 Announce Type: replace
Abstract: Multimodal large language models (MLLMs) excel at general video understanding but struggle with fast, high-frequency sports like tennis, where rally clips are short yet information-dense. To systemat…