Accelerating decode-heavy LLM inference with speculative decoding on AWS Trainium and vLLMBy Yahav Biran / April 15, 2026 In this post, you will learn how speculative decoding works and why it helps reduce cost per generated token on AWS Trainium2.