Critical Windows of Complexity Control: When Transformers Decide to Reason or Memorize
arXiv:2605.04396v1 Announce Type: new
Abstract: Recent work has shown that Transformers’ compositional generalization is governed by \emph{complexity control}, initialization scale and weight decay, which steers training toward low-complexity reasonin…