To run deepseek v4 flash how much max vram we need? 175 gb or 320gb?

As far as i know the weight is of 160gb + 9.6gb needed for max 1 million token window + 5 gigs overhead = 175gb vram.

But vllm and othere sources said "To use the full 1M context, you need 4x A100 80G" --> thats a 320gb vram ?? Am i missing something??

Sources:

  1. https://lushbinary.com/blog/deepseek-v4-self-hosting-guide-vllm-hardware-deployment/?hl=en-GB
  2. Vllm blog of deployment

9.6 gig is also sourced from vllm blog page + official model page says it take 10% kv cache of what 3.2 used to take

submitted by /u/9r4n4y
[link] [comments]

Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top