cs.CL

Efficient Pre-Training with Token Superposition

arXiv:2605.06546v1 Announce Type: new
Abstract: Pre-training of Large Language Models is often prohibitively expensive and inefficient at scale, requiring complex and invasive modifications in order to achieve high data throughput. In this work, we pr…