InstrAct: Towards Action-Centric Understanding in Instructional Videos
arXiv:2604.08762v1 Announce Type: new
Abstract: Understanding instructional videos requires recognizing fine-grained actions and modeling their temporal relations, which remains challenging for current Video Foundation Models (VFMs). This difficulty s…