DeepSeek just released V4. Pro is 1.6T total with 49B active. Flash is 284B with 13B active. Both MIT licensed, both 1M context.
https://huggingface.co/collections/deepseek-ai/deepseek-v4
1M context on open weights is the part I cannot stop thinking about. Until today if you wanted that length you were paying Gemini or Claude prices. Now it is downloadable under MIT. That is a genuine shift in what self-hosted long-context work looks like.
The 49B active on a 1.6T Pro model is the other thing. That activation ratio is aggressive. If the quality holds at that activation count, the inference economics are going to reshape routing decisions across the board.
Tech report is in the HF repo for anyone wanting the training details.
Obviously this is launch-day framing from DeepSeek so the real story will land in a week when people start running it on actual workloads. But on paper this is the biggest open-weight release of the year so far.
[link] [comments]