Towards Temporal Compositional Reasoning in Long-Form Sports Videos
arXiv:2604.22226v1 Announce Type: new
Abstract: Sports videos are a challenging domain for multimodal understanding because they involve complex and dynamic human activities. Despite rapid progress in Multimodal Large Language Models (MLLMs), long-hor…