cs.AI, cs.LG

Critical Windows of Complexity Control: When Transformers Decide to Reason or Memorize

arXiv:2605.04396v1 Announce Type: new
Abstract: Recent work has shown that Transformers’ compositional generalization is governed by \emph{complexity control}, initialization scale and weight decay, which steers training toward low-complexity reasonin…