Maxime Guigon, Lucas Dixon, Micha\"el E. Sander

A Study on Hidden Layer Distillation for Large Language Model Pre-Training

Maxime Guigon, Lucas Dixon, Micha\"el E. Sander / May 13, 2026

arXiv:2605.11513v1 Announce Type: new
Abstract: Knowledge Distillation (KD) is a critical tool for training Large Language Models (LLMs), yet the majority of research focuses on approaches that rely solely on output logits, neglecting semantic informa…

Author name: Maxime Guigon, Lucas Dixon, Micha\"el E. Sander

A Study on Hidden Layer Distillation for Large Language Model Pre-Training