cs.AI, cs.CL, cs.LG

SpecBound: Adaptive Bounded Self-Speculation with Layer-wise Confidence Calibration

arXiv:2604.12247v1 Announce Type: cross
Abstract: Speculative decoding has emerged as a promising approach to accelerate autoregressive inference in large language models (LLMs). Self-draft methods, which leverage the base LLM itself for speculation, …