LookWhen? Fast Video Recognition by Learning When, Where, and What to Compute
arXiv:2605.06809v1 Announce Type: new
Abstract: Transformers dominate video recognition. They split videos into tokens, and processing them has expensive superlinear computational cost. Yet videos are filled with redundancy, so we can question the nee…