Author name: Stephen J. Thomas

Gated Subspace Inference for Transformer Acceleration

Stephen J. Thomas / May 6, 2026

arXiv:2605.03109v1 Announce Type: cross
Abstract: A method is presented for accelerating inference in transformer language models by exploiting the low effective rank of the token activation manifold at each layer. The method decomposes each activatio…

cs.AI, cs.LG

Cascade Token Selection for Transformer Attention Acceleration

Stephen J. Thomas / May 6, 2026

arXiv:2605.03110v1 Announce Type: cross
Abstract: A method is presented for reducing the cost of representative token selection in transformer attention layers by exploiting the coherence of the representative set across depth. Activation Decorrelatio…