cs.CV

ClustViT: Clustering-based Token Merging for Semantic Segmentation

arXiv:2510.01948v2 Announce Type: replace
Abstract: Vision Transformers can achieve high accuracy and strong generalization across various contexts, but their practical applicability on real-world robotic systems is limited due to their quadratic atte…