Cornelius Kummer, Lena Jurkschat, Michael F\"arber, Sahar Vahdati

Prompt Compression in the Wild: Measuring Latency, Rate Adherence, and Quality for Faster LLM Inference

Cornelius Kummer, Lena Jurkschat, Michael F\"arber, Sahar Vahdati / April 6, 2026

arXiv:2604.02985v1 Announce Type: cross
Abstract: With the wide adoption of language models for IR — and specifically RAG systems — the latency of the underlying LLM becomes a crucial bottleneck, since the long contexts of retrieved passages lead la…

Author name: Cornelius Kummer, Lena Jurkschat, Michael F\"arber, Sahar Vahdati

Prompt Compression in the Wild: Measuring Latency, Rate Adherence, and Quality for Faster LLM Inference