Images in Sentences: Scaling Interleaved Instructions for Unified Visual Generation
arXiv:2605.12305v1 Announce Type: new
Abstract: While recent advancements in multimodal language models have enabled image generation from expressive multi-image instructions, existing methods struggle to maintain performance under complex interleaved…