Mema: Memory-Augmented Adapter for Enhanced Vision-Language Understanding
arXiv:2603.00655v2 Announce Type: replace
Abstract: Multimodal Large Language Models (MLLMs) have achieved remarkable performance by aligning pretrained visual representations with the linguistic knowledge embedded in Large Language Models (LLMs). How…