DeepSeek 3.2 eating the opening think tag on llama.cpp server?

Hey guys. Having a weird issue with the new DeepSeek V3.2 Unsloth GGUF via llama-server. The model starts reasoning fine, but the actual opening think tag is missing from the output stream. I just see the plain text reasoning, and then the closing tag at the end.

Because of this, Open WebUI doesn't collapse the thought block. Im on a 512GB box, command is just llama-server -m model_name -t 32 --flash-attn on. Tried toggling reasoning on/off, didn't help.

Is the chat template broken in these specific GGUFs or am I missing a flag?

submitted by /u/Winter_Engineer2163
[link] [comments]

Leave a Comment