LocalLLaMA

How does Pi coding agent control Qwen’s thinking verbosity? (Qwen 35B A3B, llama-server)

I'm running Qwen 35B A3B via llama-server with reasoning budget set to -1 (unlimited) for testing. In every client I've tried, the model just thinks endlessly before responding. But with Pi, it does the bare minimum thinking and still responds …