Kai Hidajat, Solden Stoll, Joseph An

Grokking as Structural Inference: Transformers Need Bayesian Lottery Tickets

Kai Hidajat, Solden Stoll, Joseph An / May 18, 2026

arXiv:2605.15787v1 Announce Type: new
Abstract: Why does a Transformer that has memorized its training set wait thousands of steps before it generalizes? Existing accounts locate this delay in norm minimization, feature emergence, or the late discover…

Author name: Kai Hidajat, Solden Stoll, Joseph An

Grokking as Structural Inference: Transformers Need Bayesian Lottery Tickets