Bowen Peng, Th\'eo Gigant, Jeffrey Quesnelle

Efficient Pre-Training with Token Superposition

Bowen Peng, Th\'eo Gigant, Jeffrey Quesnelle / May 8, 2026

arXiv:2605.06546v1 Announce Type: new
Abstract: Pre-training of Large Language Models is often prohibitively expensive and inefficient at scale, requiring complex and invasive modifications in order to achieve high data throughput. In this work, we pr…

Author name: Bowen Peng, Th\'eo Gigant, Jeffrey Quesnelle

Efficient Pre-Training with Token Superposition