cs.AI, cs.CV, cs.RO

KITE: Keyframe-Indexed Tokenized Evidence for VLM-Based Robot Failure Analysis

arXiv:2604.07034v1 Announce Type: cross
Abstract: We present KITE, a training-free, keyframe-anchored, layout-grounded front-end that converts long robot-execution videos into compact, interpretable tokenized evidence for vision-language models (VLMs)…