cs.AI, cs.DC

SpecBranch: Speculative Decoding via Hybrid Drafting and Rollback-Aware Branch Parallelism

arXiv:2506.01979v4 Announce Type: replace-cross
Abstract: Recently, speculative decoding (SD) has emerged as a promising technique to accelerate LLM inference by employing a small draft model to propose draft tokens in advance, and validating them in …