SSA: Improving Performance With a Better Scoring Function
arXiv:2508.14685v4 Announce Type: replace
Abstract: While transformer models exhibit strong in-context learning (ICL) abilities, they often fail to generalize under simple distribution shifts. We analyze these failures and identify Softmax, the scorin…