DGX Spark and Mac Mini for Local PyTorch Development
The DGX Spark for local LLM inferencing and fine-tuning was a pretty popular discussion topic recently. I got to play with one myself, primarily working…
The DGX Spark for local LLM inferencing and fine-tuning was a pretty popular discussion topic recently. I got to play with one myself, primarily working…
Multiple-Choice Benchmarks, Verifiers, Leaderboards, and LLM Judges with Code Examples
Previously, I compared the most notable open-weight architectures of 2025 in The Big LLM Architecture Comparison. Then, I zoomed in and discussed the…
OpenAI just released their new open-weight LLMs this week: gpt-oss-120b and gpt-oss-20b, their first open-weight models since GPT-2 in 2019. And yes, thanks…
It has been seven years since the original GPT architecture was developed. At first glance, looking back at GPT-2 (2019) and forward to DeepSeek-V3 and…
The latest in LLM research with a hand-curated, topic-organized list of over 200 research papers from 2025.
KV caches are one of the most critical techniques for efficient inference in LLMs in production. KV caches are an important component for compute-efficient…
Why build an LLM from scratch? It’s probably the best and most efficient way to learn how LLMs really work. Plus, many readers have told me they had a lot…
A lot has happened this month, especially with the releases of new flagship models like GPT-4.5 and Llama 4. But you might have noticed that reactions to…
As you know, I’ve been writing a lot lately about the latest research on reasoning in LLMs. Before my next research-focused blog post, I wanted to offer…