Let’s Build the GPT Tokenizer: A Complete Guide to Tokenization in LLMs
18 months ago, Andrej Karpathy set a challenge: “Can you take my 2h13m tokenizer video and translate the video into the format of a book chapter”. We’ve done it, and the chapter is below, including key pieces of code inlined, and images from the video…