(llama.cpp) Possible to disable reasoning for some requests (while leaving reasoning on by default)?

I am running unsloth/gemma-4-26B-A4B-it-GGUF/gemma-4-26B-A4B-it-UD-Q4_K_XL.gguf with llama-server (with reasoning enabled).

Is it possible to disable reasoning for some requests only? If yes, how?

I want to leave reasoning on by default, but in some other use cases I want it to respond as fast as possible (e.g. chatting bot)

submitted by /u/regunakyle
[link] [comments]

Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top