cs.CL, cs.LG

How Do Transformers Learn to Associate Tokens: Gradient Leading Terms Bring Mechanistic Interpretability

arXiv:2601.19208v2 Announce Type: replace-cross
Abstract: Semantic associations such as the link between “bird” and “flew” are foundational for language modeling as they enable models to go beyond memorization and instead generalize and generate coher…