cs.CV

Unlocking Few-Shot Capabilities in LVLMs via Prompt Conditioning and Head Selection

arXiv:2603.24181v1 Announce Type: new
Abstract: Current Large Vision Language Models (LVLMs) excel at many zero-shot tasks like image captioning, visual question answering and OCR. However, these same models suffer from poor performance at image class…