cs.AI, cs.MA

QKVShare: Quantized KV-Cache Handoff for Multi-Agent On-Device LLMs

arXiv:2605.03884v1 Announce Type: new
Abstract: Multi-agent LLM systems on edge devices need to hand off latent context efficiently, but the practical choices today are expensive re-prefill or full-precision KV transfer. We study QKVShare, a framework…