ConFu: Contemplate the Future for Better Speculative Sampling
arXiv:2603.08899v3 Announce Type: replace-cross
Abstract: Speculative decoding has emerged as a powerful approach to accelerate large language model (LLM) inference by employing lightweight draft models to propose candidate tokens that are subsequentl…