LocalLLaMA

Thinking with a smaller model to speed things up?

Question: can i do the thinking with a smaller model, like Gemma 4 4B, then use that as the prompt for Gemma 4 31B, to speed things up? Has anyone done this and measure if it's worth it? submitted by /u/q-admin007 [link] [comm…