cs.CL

LogitSpec: Accelerating Retrieval-based Speculative Decoding via Next Next Token Speculation

arXiv:2507.01449v3 Announce Type: replace
Abstract: Speculative decoding (SD), where a small draft model is employed to propose draft tokens in advance and then the target model validates them in parallel, has emerged as a promising technique for LLM …