/u/antirez - Provide.ai

llama.cpp DeepSeek v4 Flash experimental inference

/u/antirez / April 26, 2026

Hi, here you can find experimental llama.cpp support for DeepSeek v4, and here there is the GGUF you can use to run the inference with "just" (lol) 128GB of RAM. The model, even quantized at 2 bit, looks very solid in my limited testing, and …

Author name: /u/antirez

llama.cpp DeepSeek v4 Flash experimental inference