Recent Developments in LLM Architectures: KV Sharing, mHC, and Compressed AttentionBy Sebastian Raschka, PhD / May 16, 2026 From Gemma 4 to DeepSeek V4, How New Open-Weight LLMs Are Reducing Long-Context Costs