/u/MexInAbu - Provide.ai

What do you consider to be the minimum performance (t/s) for local Agent workflows?

/u/MexInAbu / April 25, 2026

What would you say is the minimum amount of tokens per second you would tolerate for your local agent workflows? I have been trying pi.dev connected to a llama.cpp instance running Qwen3.6-27B-Q6_K_L with 200K context running on an RTX A6000. I get ab…

Author name: /u/MexInAbu

What do you consider to be the minimum performance (t/s) for local Agent workflows?