LongSpec: Long-Context Lossless Speculative Decoding with Efficient Drafting and Verification
arXiv:2502.17421v4 Announce Type: replace-cross
Abstract: As Large Language Models (LLMs) can now process extremely long contexts, efficient inference over these extended inputs has become increasingly important, especially for emerging applications l…