cs.CL

River-LLM: Large Language Model Seamless Exit Based on KV Share

arXiv:2604.18396v1 Announce Type: new
Abstract: Large Language Models (LLMs) have demonstrated exceptional performance across diverse domains but are increasingly constrained by high inference latency. Early Exit has emerged as a promising solution to…