cs.AI, cs.CL, cs.DC, cs.OS

StepCache: Step-Level Reuse with Lightweight Verification and Selective Patching for LLM Serving

arXiv:2603.28795v1 Announce Type: cross
Abstract: We address LLM serving workloads where repeated requests share a common solution structure but differ in localized constraints, such as output schema, variable names, or numeric constants. Prior cachin…