qwen 3.6 27B looping problem

Whenever I write here that I use gemma 31B I get answers that qwen 27B is better. I switched in the pi from gemma 31B Q5 to qwen 27B Q8 and generally I manage to code, document and run tests but somewhere after exceeding 100k context qwen keeps getting into loops. Do you have any solution for this?

https://preview.redd.it/o4e1vxkc29zg1.png?width=2575&format=png&auto=webp&s=c6f93e53127b5c8ba798f1c7b503a06172425a0a

https://preview.redd.it/8qriwlrd29zg1.png?width=2747&format=png&auto=webp&s=082cf04774aa7ae77044ff04d5962a2f0606f73a

https://preview.redd.it/xz9lsdde29zg1.png?width=2447&format=png&auto=webp&s=81e4d88a1a0347fc9f6ef743ef612db47557c7b5

I tried to break it and tell him to start over, try again, etc... but it keeps looping

my current command is:

CUDA_VISIBLE_DEVICES=0,1,2 llama-server -c 200000 -m /mnt/models2/Qwen/3.6/Qwen3.6-27B-UD-Q8_K_XL.gguf --host 0.0.0.0 --jinja -fa on --keep 4096 -b 8192 --spec-type ngram-mod --parallel 1 --ctx-checkpoints 24 --checkpoint-every-n-tokens 8192 --cache-ram 65536

submitted by /u/jacek2023
[link] [comments]

Leave a Comment