cs.LG

Rethinking the Global Knowledge of CLIP in Training-Free Open-Vocabulary Semantic Segmentation

arXiv:2502.06818v3 Announce Type: replace
Abstract: Recent works modify CLIP to perform open-vocabulary semantic segmentation in a training-free manner (TF-OVSS). In vanilla CLIP, patch-wise image representations mainly encode homogeneous image-level …