cs.LG, stat.ML

Transformers Efficiently Perform In-Context Logistic Regression via Normalized Gradient Descent

arXiv:2605.06609v1 Announce Type: cross
Abstract: Transformers have demonstrated remarkable in-context learning (ICL) capabilities. The strong ICL performance of transformers is commonly believed to arise from their ability to implicitly execute certa…