/u/WhatererBlah555 - Provide.ai

Is there a way to mitigate performance as context grows?

/u/WhatererBlah555 / April 26, 2026

In my local LLM setup I get from 30 to 80 t/s generation at the beginning, but it drops quite a lot as context grows. I use llama.cpp/Vulkan with an MI50 and a V100, is there some command line flags that can improve this issue? Or some good practice ot…

Author name: /u/WhatererBlah555

Is there a way to mitigate performance as context grows?