SpecTr-GBV: Multi-Draft Block Verification Accelerating Speculative Decoding
arXiv:2604.25925v1 Announce Type: new
Abstract: Autoregressive language models suffer from high inference latency due to their sequential decoding nature. Speculative decoding (SD) mitigates this by employing a lightweight draft model to propose candi…