DLink: Distilling Layer-wise and Dominant Knowledge from EEG Foundation Models
arXiv:2604.15016v2 Announce Type: replace
Abstract: EEG foundation models (EFMs) achieve strong cross-subject and cross-task generalization through large-scale pretraining and downstream fine-tuning. Through empirical analysis, we observe that (i) task-adapted EFMs provide strong decoding performance but incur substantial overhead when retained as inference backbones, making knowledge distillation a natural route for optimizing compact students; and (ii) direct distillation from a fixed teacher representation underutilizes EFM knowledge, as task-discriminative information is distributed across intermediate layers rather than concentrated in the final layer. These observations motivate DLink (Distilling Layer-wise and Dominant Knowledge), a spectrally guided distillation framework with input-conditioned layer routing for transferring EFM knowledge into compact students. DLink uses a lightweight router to aggregate teacher layers for each input, and aligns magnitude and phase spectra to mitigate compression-induced spectral distortion in learned representations. The routed teacher knowledge is internalized by a project-then-compress student; the teacher and router are used only during training. Experiments on four EEG benchmarks show that DLink improves matched compact students and remains competitive with lightweight baselines, narrowing the gap to fine-tuned EFMs while substantially reducing parameters, FLOPs, and CPU-only inference latency.