cs.CL, cs.LG

ConfLayers: Adaptive Confidence-based Layer Skipping for Self-Speculative Decoding

arXiv:2604.14612v1 Announce Type: new
Abstract: Self-speculative decoding is an inference technique for large language models designed to speed up generation without sacrificing output quality. It combines fast, approximate decoding using a compact ve…