cs.AI, cs.LG, stat.ML

Multi-layer Cross-Attention is Provably Optimal for Multi-modal In-context Learning

arXiv:2602.04872v2 Announce Type: replace-cross
Abstract: Recent progress has rapidly advanced our understanding of the mechanisms underlying in-context learning in modern attention-based neural networks. However, existing results focus exclusively on…