StepCache: Step-Level Reuse with Lightweight Verification and Selective Patching for LLM Serving
arXiv:2603.28795v1 Announce Type: cross
Abstract: We address LLM serving workloads where repeated requests share a common solution structure but differ in localized constraints, such as output schema, variable names, or numeric constants. Prior cachin…