Zhongyuan Bao, Lejun Zhang

TennisTV: Do Multimodal Large Language Models Understand Tennis Rallies?

Zhongyuan Bao, Lejun Zhang / April 17, 2026

arXiv:2509.15602v5 Announce Type: replace
Abstract: Multimodal large language models (MLLMs) excel at general video understanding but struggle with fast, high-frequency sports like tennis, where rally clips are short yet information-dense. To systemat…

Author name: Zhongyuan Bao, Lejun Zhang

TennisTV: Do Multimodal Large Language Models Understand Tennis Rallies?