PSA: Qwen3.6 ships with preserve_thinking. Make sure you have it on.

I had previously posted here about a fix to their 3.5 template to help resolve the KV cache invalidation issue from their template. A lot of you found it useful.

Qwen 3.6 now addresses this with a new preserve_thinking flag. From their model page:

please use "preserve_thinking": True instead of "chat_template_kwargs": {"preserve_thinking": False}.

This capability is particularly beneficial for agent scenarios, where maintaining full reasoning context can enhance decision consistency and, in many cases, reduce overall token consumption by minimizing redundant reasoning. Additionally, it can improve KV cache utilization, optimizing inference efficiency in both thinking and non-thinking modes.

What this means in practice:
The model's previous reasoning now stays in context instead of getting stripped and re-serialized differently on each turn. That was the root cause of the cache invalidation issue. The model should also give better results in agent/tool-calling workflows since it can reference its own prior reasoning instead of starting from scratch each turn.

How to validate that preserve thinking is on:
Simple test: ask the model:
can you come up with two random 20 digit number and validate that they are 20 digits, do not use any tools, and only give me one of the two and nothing else

Ensure the model actually thinks of two numbers otherwise retry, next turn ask:
now give me the second number that you came up with

preserve_thinking: off - the model loses access to its own reasoning from the previous turn. It doesn't remember generating two numbers and tells you there's no second number to share.

preserve_thinking: on - the model can reference its prior thinking, remembers both numbers, and gives you the second one immediately.

Status:
So far I've confirmed LMStudio does not yet support it. I have an open PR on oMLX to add support for it on oMLX

submitted by /u/onil_gova
[link] [comments]

Leave a Comment