Beyond URLs: Metadata Diversity and Position for Efficient LLM Pretraining
arXiv:2511.21613v2 Announce Type: replace
Abstract: Incorporating metadata in Large Language Models (LLMs) pretraining has recently emerged as a promising approach to accelerate training. However prior work highlighted only one useful signal-URLs, lea…