Recommendations for Getting the Most Out of a Technical Book
This short article compiles a few notes I previously shared when readers ask how to get the most out of my building large language model from scratch books…
This short article compiles a few notes I previously shared when readers ask how to get the most out of my building large language model from scratch books…
After I shared my Big LLM Architecture Comparison a few months ago, which focused on the main transformer-based LLMs, I received a lot of questions with…
The DGX Spark for local LLM inferencing and fine-tuning was a pretty popular discussion topic recently. I got to play with one myself, primarily working…
Multiple-Choice Benchmarks, Verifiers, Leaderboards, and LLM Judges with Code Examples
Previously, I compared the most notable open-weight architectures of 2025 in The Big LLM Architecture Comparison. Then, I zoomed in and discussed the…
OpenAI just released their new open-weight LLMs this week: gpt-oss-120b and gpt-oss-20b, their first open-weight models since GPT-2 in 2019. And yes, thanks…
It has been seven years since the original GPT architecture was developed. At first glance, looking back at GPT-2 (2019) and forward to DeepSeek-V3 and…
The latest in LLM research with a hand-curated, topic-organized list of over 200 research papers from 2025.
KV caches are one of the most critical techniques for efficient inference in LLMs in production. KV caches are an important component for compute-efficient…
Why build an LLM from scratch? It’s probably the best and most efficient way to learn how LLMs really work. Plus, many readers have told me they had a lot…