TokenTiming: A Dynamic Alignment Method for Universal Speculative Decoding Model Pairs
arXiv:2510.15545v4 Announce Type: replace-cross
Abstract: Accelerating the inference of large language models (LLMs) has been a critical challenge in generative AI. Speculative decoding (SD) substantially improves LLM inference efficiency. However, it…