Model Capacity Determines Grokking through Competing Memorisation and Generalisation Speeds
arXiv:2605.09724v1 Announce Type: new
Abstract: Existing accounts of grokking explain the phenomena in terms of mechanistic frameworks such as circuit efficiency or lazy-to-rich transitions. However, despite a known dependence between grokking and mod…