Yash Akhauri, Mohamed S. Abdelfattah

Compute Where it Counts: Self Optimizing Language Models

Yash Akhauri, Mohamed S. Abdelfattah / May 12, 2026

arXiv:2605.10875v1 Announce Type: cross
Abstract: Efficient LLM inference research has largely focused on reducing the cost of each decoding step (e.g., using quantization, pruning, or sparse attention), typically applying a uniform computation budget…

Author name: Yash Akhauri, Mohamed S. Abdelfattah

Compute Where it Counts: Self Optimizing Language Models