Aojie Yuan, Tianqi Shen, Dajun Zhang

Not All Thoughts Need HBM: Semantics-Aware Memory Hierarchy for LLM Reasoning

Aojie Yuan, Tianqi Shen, Dajun Zhang / May 12, 2026

arXiv:2605.09490v1 Announce Type: new
Abstract: Reasoning LLMs produce thousands of chain-of-thought tokens whose KV cache must reside in scarce GPU HBM. The dominant response — permanently evicting low-importance tokens — is catastrophic for reason…

Author name: Aojie Yuan, Tianqi Shen, Dajun Zhang

Not All Thoughts Need HBM: Semantics-Aware Memory Hierarchy for LLM Reasoning