TriRelVLA: Triadic Relational Structure for Generalizable Embodied Manipulation
arXiv:2605.05714v1 Announce Type: cross
Abstract: Vision-language-action (VLA) models perform well on training-seen robotic tasks but struggle to generalize to unseen scenes and objects. A key limitation lies in their implicit visual representations, …