Data Selection for Multi-turn Dialogue Instruction Tuning
arXiv:2604.07892v3 Announce Type: replace
Abstract: Instruction-tuned language models increasingly rely on large multi-turn dialogue corpora, but these datasets are often noisy and structurally inconsistent, with topic drift, repetitive chitchat, and …