cs.CL

MTA: Multi-Granular Trajectory Alignment for Large Language Model Distillation

arXiv:2605.01374v1 Announce Type: new
Abstract: Knowledge distillation is a key technique for compressing large language models (LLMs), but most existing methods align representations at fixed layers or token-level outputs, ignoring how representation…