cs.CV, cs.LG

Text-Conditional JEPA for Learning Semantically Rich Visual Representations

arXiv:2605.03245v1 Announce Type: new
Abstract: Image-based Joint-Embedding Predictive Architecture (I-JEPA) offers a promising approach to visual self-supervised learning through masked feature prediction. However with the inherent visual uncertainty…