Uncovering Symmetry Transfer in Large Language Models via Layer-Peeled Optimization
arXiv:2605.12756v1 Announce Type: cross
Abstract: Large language models (LLMs) are pretrained by minimizing the cross-entropy loss for next-token prediction. In this paper, we study whether this optimization strategy can induce geometric structure in …