Rei Taniguchi, Yuyang Dong, Makoto Onizuka, Chuan Xiao

Adaptive Layer Selection for Layer-Wise Token Pruning in LLM Inference

Rei Taniguchi, Yuyang Dong, Makoto Onizuka, Chuan Xiao / April 17, 2026

arXiv:2601.07667v2 Announce Type: replace-cross
Abstract: Due to the prevalence of large language models (LLMs), key-value (KV) cache reduction for LLM inference has received remarkable attention. Among numerous works that have been proposed in recent…

Author name: Rei Taniguchi, Yuyang Dong, Makoto Onizuka, Chuan Xiao

Adaptive Layer Selection for Layer-Wise Token Pruning in LLM Inference