Yuhao Shen, Junyi Shen, Quan Kong, Tianyu Liu, Yao Lu, Cong Wang

SpecBranch: Speculative Decoding via Hybrid Drafting and Rollback-Aware Branch Parallelism

Yuhao Shen, Junyi Shen, Quan Kong, Tianyu Liu, Yao Lu, Cong Wang / April 15, 2026

arXiv:2506.01979v4 Announce Type: replace-cross
Abstract: Recently, speculative decoding (SD) has emerged as a promising technique to accelerate LLM inference by employing a small draft model to propose draft tokens in advance, and validating them in …

Author name: Yuhao Shen, Junyi Shen, Quan Kong, Tianyu Liu, Yao Lu, Cong Wang

SpecBranch: Speculative Decoding via Hybrid Drafting and Rollback-Aware Branch Parallelism