Author name: Zihao Xu, John Harvill, Ziwei Fan, Yizhou Sun, Hao Ding, Hao Wang

Compressing Sequences in the Latent Embedding Space: $K$-Token Merging for Large Language Models

Zihao Xu, John Harvill, Ziwei Fan, Yizhou Sun, Hao Ding, Hao Wang / April 23, 2026

arXiv:2604.15153v2 Announce Type: replace-cross
Abstract: Large Language Models (LLMs) incur significant computational and memory costs when processing long prompts, as full self-attention scales quadratically with input length. Token compression aims…

cs.AI, cs.CL

Compressing Sequences in the Latent Embedding Space: $K$-Token Merging for Large Language Models

Zihao Xu, John Harvill, Ziwei Fan, Yizhou Sun, Hao Ding, Hao Wang / April 17, 2026

arXiv:2604.15153v1 Announce Type: new
Abstract: Large Language Models (LLMs) incur significant computational and memory costs when processing long prompts, as full self-attention scales quadratically with input length. Token compression aims to addres…