RTX 5070 Ti 16GB + 32GB RAM: Running Qwen3.6-35B-A3B Q8_0 @ 44 t/s (128K context)
32GB DDR5 RAM. unsloth/Qwen3.6-35B-A3B-GGUF Q8_0 : 36.9 GB LM studio settings: – GPU Offload: 40 – Offload MoE Experts to CPU: 26 -Try mmap: on -K cache:Q8_0 -V cache:Q8_0 llama.cpp will be better. submitted by /u/moahmo88 [link] …