Comparing Qwen3.5 27B vs Gemma 4 31B for agentic stuff

Models compared:

Main flags for boths

--flash-attn on \

--n-gpu-layers 99 \

--no-mmap \

-c 150000 \

--temp 1 --top-p 0.9 --min-p 0.1 --top-k 20 \

--ctx-checkpoints 1 \

--jinja \

-np 1 \

--reasoning on \

--mmproj 'mmproj-BF16.gguf' \

--image-min-tokens 300 --image-max-tokens 512

I know they may not be the best and I still need more experiments (thank you u/Sadman782) I find these tests fun and interesting.

Model	Observations
Qwen3.5-27B-UD-Q5_K_XL	More steps, checks env var, corrects its fails to fully address the requests so final results is good (in the example, the telegram message is perfect), sometimes create a python script instead of bash only
gemma-4-31B-it-UD-Q5_K_XL	More direct (smarter to finds urls) but may miss the final goal (in this example the telegram message was truncated

Please let me know if you need more tests.