In the V3.2 paper, they mentioned:
Second, token efficiency remains a challenge; DeepSeek-V3.2 typically requires longer generation trajectories (i.e., more tokens) to match the output quality of models like Gemini 3.0-Pro. Future work will focus on optimizing the intelligence density of the model’s reasoning chains to improve efficiency.
However, in V4 Pro, the situation seems to have worsened. Even the non-thinking mode uses significantly more tokens than V3.2, and V4 Pro (1.6T) is roughly 2.5x larger than V3.2 (0.67T). This suggests that the intelligence density of the model has decreased rather than improved!
If we compare it with GPT-5.4 and GPT-5.5, the gap is even larger. DeepSeek appears to require around 10x more tokens to achieve similar performance. Assuming the same TPS, this implies roughly 10x longer for DeepSeek V4 Pro to complete the same task.
submitted by