ALADIN:Attribute-Language Distillation Network for Person Re-Identification
arXiv:2603.21482v2 Announce Type: replace
Abstract: Recent vision-language models such as CLIP provide strong cross-modal alignment, but current CLIP-guided ReID pipelines rely on global features and fixed prompts. This limits their ability to capture…