Cosmopedia: how to create large-scale synthetic data for pre-training Large Language ModelsBy Hugging Face - Blog / March 20, 2024