Beyond N-gram: Data-Aware X-GRAM Extraction for Efficient Embedding Parameter Scaling
arXiv:2604.21724v2 Announce Type: replace
Abstract: Large token-indexed lookup tables provide a compute-decoupled scaling path, but their practical gains are often limited by poor parameter efficiency and rapid memory growth. We attribute these limita…