RTX 5070 Ti 16GB + 32GB RAM: Running Qwen3.6-35B-A3B Q8_0 @ 44 t/s (128K context)

32GB DDR5 RAM.

unsloth/Qwen3.6-35B-A3B-GGUF Q8_0 : 36.9 GB

LM studio settings:

- GPU Offload: 40

- Offload MoE Experts to CPU: 26

-Try mmap: on

-K cache:Q8_0

-V cache:Q8_0

llama.cpp will be better.

submitted by /u/moahmo88
[link] [comments]

Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top