Zehao Fan, Garrett Gagnon, Zhenyu Liu, Liu Liu

Sparse Attention Remapping with Clustering for Efficient LLM Decoding on PIM

Zehao Fan, Garrett Gagnon, Zhenyu Liu, Liu Liu / May 13, 2026

arXiv:2505.05772v2 Announce Type: replace
Abstract: Transformer-based models are the foundation of modern machine learning, but their execution, particularly during autoregressive decoding in large language models (LLMs), places significant pressure o…

Author name: Zehao Fan, Garrett Gagnon, Zhenyu Liu, Liu Liu

Sparse Attention Remapping with Clustering for Efficient LLM Decoding on PIM