SCASeg: Strip Cross-Attention for Efficient Semantic Segmentation
arXiv:2411.17061v2 Announce Type: replace
Abstract: The Vision Transformer (ViT) has achieved notable success in computer vision, with its variants widely validated across various downstream tasks, including semantic segmentation. However, as general-…