GEMS: Agent-Native Multimodal Generation with Memory and Skills
arXiv:2603.28088v1 Announce Type: new
Abstract: Recent multimodal generation models have achieved remarkable progress on general-purpose generation tasks, yet continue to struggle with complex instructions and specialized downstream tasks. Inspired by…