cs.CL, cs.CV

ESsEN: Training Compact Discriminative Vision-Language Transformers in a Low-Resource Setting

arXiv:2604.18452v1 Announce Type: new
Abstract: Vision-language modeling is rapidly increasing in popularity with an ever expanding list of available models. In most cases, these vision-language models have parameters in the tens of billions, which is…