LocalLLaMA

Running the new Qwen3.6-35B-A3B at full context on both a 4090 and GB10 Spark with vLLM and Llama.cpp

Here is how to run the new Qwen3.6-35B-A3B > At full context on a 4090 – IQ4_XS gguf with llama cpp > At full context on a Spark – FP8 with a tweaked vLLM Here is the docker compose with llama cpp services: llamacpp: container_name: llama…