Multi-Aspect Knowledge Distillation for Language Model with Low-rank Factorization
arXiv:2604.03110v1 Announce Type: new
Abstract: Knowledge distillation is an effective technique for pre-trained language model compression. However, existing methods only focus on the knowledge distribution among layers, which may cause the loss of f…