Lifelong Learning in Vision-Language Models: Enhanced EWC with Cross-Modal Knowledge Retention
arXiv:2605.12789v1 Announce Type: new
Abstract: Large language-vision models (LVLMs) such as CLIP, Flamingo, and BLIP have revolutionized AI by enabling understanding across textual and visual modalities. These models excel at tasks like image caption…