Grokking of Diffusion Models: Case Study on Modular Addition
arXiv:2604.17673v1 Announce Type: new
Abstract: Despite their empirical success, how diffusion models generalize remains poorly understood from a mechanistic perspective. We demonstrate that diffusion models trained with flow-matching objectives exhib…