What do you consider to be the minimum performance (t/s) for local Agent workflows?

What would you say is the minimum amount of tokens per second you would tolerate for your local agent workflows?

I have been trying pi.dev connected to a llama.cpp instance running Qwen3.6-27B-Q6_K_L with 200K context running on an RTX A6000. I get about 26 t/s and is surprisingly usable. About the same user experience I get with Claude Code connected to Anthropic. But I have just been fooling around with relative simple prompts so far. I'm trying out Brave search API.

submitted by /u/MexInAbu
[link] [comments]

Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top