/u/erdaltoprak - Provide.ai

Running the new Qwen3.6-35B-A3B at full context on both a 4090 and GB10 Spark with vLLM and Llama.cpp

/u/erdaltoprak / April 16, 2026

Here is how to run the new Qwen3.6-35B-A3B > At full context on a 4090 – IQ4_XS gguf with llama cpp > At full context on a Spark – FP8 with a tweaked vLLM Here is the docker compose with llama cpp services: llamacpp: container_name: llama…

Author name: /u/erdaltoprak

Running the new Qwen3.6-35B-A3B at full context on both a 4090 and GB10 Spark with vLLM and Llama.cpp