ViTaPEs: Visuotactile Position Encodings for Cross-Modal Alignment in Multimodal Transformers
arXiv:2505.20032v3 Announce Type: replace
Abstract: Tactile sensing provides local essential information that is complementary to visual perception, such as texture, compliance, and force. Despite recent advances in visuotactile representation learnin…