Samet Demir, Zafer Dogan

Optimal Attention Temperature Improves the Robustness of In-Context Learning under Distribution Shift in High Dimensions

Samet Demir, Zafer Dogan / May 12, 2026

arXiv:2511.01292v2 Announce Type: replace
Abstract: Pretrained Transformers can perform in-context learning (ICL) from a few demonstrations, but this ability can fail sharply when the test distribution differs from pretraining, a common deployment set…

Author name: Samet Demir, Zafer Dogan

Optimal Attention Temperature Improves the Robustness of In-Context Learning under Distribution Shift in High Dimensions