FBS: Modeling Native Parallel Reading inside a Transformer
arXiv:2601.21708v2 Announce Type: replace
Abstract: Large language models (LLMs) excel across many tasks, yet inference is still dominated by strictly token-by-token autoregression. Existing acceleration methods largely patch this pipeline and miss co…