I have DeepSeek V4 Pro at home

Just wanted to share that I used u/LegacyRemaster slightly modified (Q4_K_M conversion support) DeepSeek V4 CUDA repo (based on u/antirez work) to convert and run Q4_K_M DeepSeek V4 Pro on my Epyc workstation (Genoa 9374F, 12 x 96GB RAM, single RTX PRO 6000 Max-Q) and it worked right from the start:

(base) phm@epyc:~/projects/llama.cpp-deepseek-v4-flash-cuda/build-cuda$ ./bin/llama-cli -m ../models/DeepSeek-V4-Pro-Q4_K_M.gguf --no-repack -ub 128 --chat-template-file ../models/templates/deepseek-ai-DeepSeek-V3.2.jinja ggml_cuda_init: found 1 CUDA devices (Total VRAM: 97247 MiB): Device 0: NVIDIA RTX PRO 6000 Blackwell Max-Q Workstation Edition, compute capability 12.0, VMM: yes, VRAM: 97247 MiB Loading model... ▄▄ ▄▄ ██ ██ ██ ██ ▀▀█▄ ███▄███▄ ▀▀█▄ ▄████ ████▄ ████▄ ██ ██ ▄█▀██ ██ ██ ██ ▄█▀██ ██ ██ ██ ██ ██ ██ ██ ▀█▄██ ██ ██ ██ ▀█▄██ ██ ▀████ ████▀ ████▀ ██ ██ ▀▀ ▀▀ build : b8936-44c7b01de model : DeepSeek-V4-Pro-Q4_K_M.gguf modalities : text available commands: /exit or Ctrl+C stop or exit /regen regenerate the last response /clear clear the chat history /read <file> add a text file /glob <pattern> add text files using globbing pattern > who are you? [Start thinking] Okay, the user is asking "who are you?" This is a simple, introductory question. I need to introduce myself clearly and warmly. I should state my name, creator, and key features that are most relevant to a new user. I can mention that I'm free, my context window, knowledge cutoff, file support, and availability on web and app. I'll end with an open invitation for further questions to keep the conversation going. [End thinking] Hi there! I'm DeepSeek, an AI assistant created by the Chinese company DeepSeek (深度求索). I'm here to help you with questions, creative tasks, problem-solving, and pretty much anything you're curious about! Here's a bit about me: - **Free to use** - no charges for chatting with me - **1M context window** - I can handle huge amounts of text at once (like entire book trilogies!) - **Knowledge cutoff: May 2025** - I'm reasonably up-to-date - **File upload support** - I can read text from images, PDFs, Word docs, Excel files, and more - **Web search capability** - though you need to manually enable it via the search button - **Available on web and mobile app** - with voice input support on the app I'm a pure text-based model, so I can't "see" images directly, but I can read any text in uploaded files. I aim to be warm, helpful, and detailed in my responses. What can I help you with today? 😊 [ Prompt: 12.2 t/s | Generation: 8.6 t/s ] > /exit Exiting... common_memory_breakdown_print: | memory breakdown [MiB] | total free self model context compute unaccounted | common_memory_breakdown_print: | - CUDA0 (RTX PRO 6000 Blackwell Max-Q Workstation Edition) | 97247 = 4022 + ( 92472 = 87766 + 84 + 4621) + 753 | common_memory_breakdown_print: | - Host | 793994 = 793954 + 0 + 39 | ~llama_context: CUDA_Host compute buffer size of 39.1719 MiB, does not match expectation of 15.3535 MiB

The model file is 859GB.

Update: ran some lineage-bench prompts to see if the model has healthy brain and no problems so far.

submitted by /u/fairydreaming
[link] [comments]

Leave a Comment