Let’s Build the GPT Tokenizer: A Complete Guide to Tokenization in LLMs

18 months ago, Andrej Karpathy set a challenge: “Can you take my 2h13m tokenizer video and translate the video into the format of a book chapter”. We’ve done it, and the chapter is below, including key pieces of code inlined, and images from the video at key points (hyperlinked to the video timestamp). It’s a great video for learning this key piece of how LLMs work, and this new text version is great too.

Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top