/u/pilibitti - Provide.ai

How does Pi coding agent control Qwen’s thinking verbosity? (Qwen 35B A3B, llama-server)

/u/pilibitti / May 17, 2026

I'm running Qwen 35B A3B via llama-server with reasoning budget set to -1 (unlimited) for testing. In every client I've tried, the model just thinks endlessly before responding. But with Pi, it does the bare minimum thinking and still responds …

Author name: /u/pilibitti

How does Pi coding agent control Qwen’s thinking verbosity? (Qwen 35B A3B, llama-server)