LocalLLaMA

Acceptable prompt processing speed for you?

I am currently optimising some ancient hardware to run qwen3 (4xV100s) but the lack of flash attention means that at longer contexts the processing starts to really slow down. For agentic coding work what processing speeds and contexts lengths d…