ESsEN: Training Compact Discriminative Vision-Language Transformers in a Low-Resource Setting
arXiv:2604.18452v1 Announce Type: new
Abstract: Vision-language modeling is rapidly increasing in popularity with an ever expanding list of available models. In most cases, these vision-language models have parameters in the tens of billions, which is…