| I had previously posted here about a fix to their 3.5 template to help resolve the KV cache invalidation issue from their template. A lot of you found it useful. Qwen 3.6 now addresses this with a new preserve_thinking flag. From their model page:
This capability is particularly beneficial for agent scenarios, where maintaining full reasoning context can enhance decision consistency and, in many cases, reduce overall token consumption by minimizing redundant reasoning. Additionally, it can improve KV cache utilization, optimizing inference efficiency in both thinking and non-thinking modes. What this means in practice: How to validate that preserve thinking is on: Ensure the model actually thinks of two numbers otherwise retry, next turn ask: preserve_thinking: off - the model loses access to its own reasoning from the previous turn. It doesn't remember generating two numbers and tells you there's no second number to share. preserve_thinking: on - the model can reference its prior thinking, remembers both numbers, and gives you the second one immediately. Status: [link] [comments] |