WRAP++: Web discoveRy Amplified Pretraining
arXiv:2604.06829v1 Announce Type: cross
Abstract: Synthetic data rephrasing has emerged as a powerful technique for enhancing knowledge acquisition during large language model (LLM) pretraining. However, existing approaches operate at the single-docum…