cs.CV

Joint Semantic Token Selection and Prompt Optimization for Interpretable Prompt Learning

arXiv:2605.04425v1 Announce Type: new
Abstract: Vision-language models such as CLIP achieve strong visual-textual alignment, but often suffer from overfitting and limited interpretability when adapted through continuous prompt learning. While discrete…