From Global to Local: Rethinking CLIP Feature Aggregation for Person Re-Identification
arXiv:2604.22190v1 Announce Type: cross
Abstract: CLIP-based person re-identification (ReID) methods aggregate spatial features into a single global \texttt{[CLS]} token optimized for image-text alignment rather than spatial selectivity, making repres…