To run deepseek v4 flash how much max vram we need? 175 gb or 320gb?

As far as i know the weight is of 160gb + 9.6gb needed for max 1 million token window + 5 gigs overhead = 175gb vram.

But vllm and othere sources said "To use the full 1M context, you need 4x A100 80G" --> thats a 320gb vram ?? Am i missing something??

Sources:

9.6 gig is also sourced from vllm blog page + official model page says it take 10% kv cache of what 3.2 used to take

submitted by /u/9r4n4y
[link] [comments]