cs.CV, cs.LG, cs.RO

ViTaPEs: Visuotactile Position Encodings for Cross-Modal Alignment in Multimodal Transformers

arXiv:2505.20032v3 Announce Type: replace
Abstract: Tactile sensing provides local essential information that is complementary to visual perception, such as texture, compliance, and force. Despite recent advances in visuotactile representation learnin…