Recent Developments in LLM Architectures: KV Sharing, mHC, and Compressed Attention
From Gemma 4 to DeepSeek V4, How New Open-Weight LLMs Are Reducing Long-Context Costs
From Gemma 4 to DeepSeek V4, How New Open-Weight LLMs Are Reducing Long-Context Costs
How coding agents use tools, memory, and repo context to make LLMs work better in practice
From MHA and GQA to MLA, sparse attention, and hybrid architectures
A lot has happened this month, especially with the releases of new flagship models like GPT-4.5 and Llama 4. But you might have noticed that reactions to…