Rethinking Local Learning: A Cheaper and Faster Recipe for LLM Post-Training
arXiv:2605.04913v2 Announce Type: replace
Abstract: LLM post-training typically propagates task gradients through the full depth of the model. Although this end-to-end structure is simple and general, it couples task adaptation to full-depth activatio…