Pushing a 5-Year-Old 6GB VRAM laptop to Its Limits: Qwen3.6-35B-A3B

For the past few weeks, I have been trying to get this model working on my hardware. It still feels incredible how much better open models have become. I couldn't have gotten this model to work on my 5yo laptop if not for this sub and its amazing people. The model is actually usable at ~23 t/s...even getting 10+ t/s when unplugged! It is very good to use with pi agent.

If you think this setup can be improved, I'd love to know more...

I've documented my full localmaxxing journey on my blog post here, someone might find it helpful.

TL;DR

Laptop: Asus ROG Zephyrus G14 2020

CPU: Ryzen 7 (8c 16t) @ 2900 Mhz (boost disabled)

Mem: 24GB DDR4-3200 RAM

GPU: RTX 2060 Max-Q 6GB VRAM

General:

#!/bin/bash llama-server \ -m ~/dev/models/Qwen3.6-35B-A3B-APEX-GGUF/Qwen3.6-35B-A3B-APEX-I-Compact.gguf \ -mm ~/dev/models/Qwen3.6-35B-A3B-GGUF/mmproj-F16.gguf \ --no-mmproj-offload \ -a Qwen3.6-35B-A3B-APEX-64k \ --host 0.0.0.0 --port 8000 \ --fit off -fa on \ --ctx-size 65536 \ --threads 8 --threads-batch 12 \ --cpu-range 0-7 --cpu-strict 1 \ --cpu-range-batch 0-11 --cpu-strict-batch 1 \ --numa isolate \ --prio 2 \ --no-mmap --parallel 1 --jinja \ --cache-type-k q8_0 --cache-type-v q8_0 \ --ubatch-size 1024 --batch-size 2048 \ --n-cpu-moe 36 \ --cache-reuse 256 \ --ctx-checkpoints 8 \ --metrics \ --cache-ram 4096 \ --spec-type ngram-mod \ --spec-ngram-mod-n-match 24 --spec-ngram-mod-n-min 12 --spec-ngram-mod-n-max 48 

Long Context: (Tom's fork)

#!/bin/bash lm-server-tq \ -m ~/dev/models/Qwen3.6-35B-A3B-APEX-GGUF/Qwen3.6-35B-A3B-APEX-I-Compact.gguf \ -a Qwen3.6-35B-A3B-APEX-128k \ --host 0.0.0.0 --port 8000 \ --fit off -fa on \ --ctx-size 131072 \ --threads 8 --threads-batch 12 \ --cpu-range 0-7 --cpu-strict 1 \ --cpu-range-batch 0-11 --cpu-strict-batch 1 \ --numa isolate \ --prio 2 \ --no-mmap --parallel 1 --jinja \ --cache-type-k turbo3 --cache-type-v turbo4 \ --ubatch-size 1024 --batch-size 2048 \ --n-cpu-moe 36 \ --cache-reuse 256 \ --ctx-checkpoints 8 \ --metrics \ --cache-ram 4096 \ --spec-type ngram-mod \ --spec-ngram-mod-n-match 24 --spec-ngram-mod-n-min 12 --spec-ngram-mod-n-max 48 
submitted by /u/abhinand05
[link] [comments]

Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top