/u/wbulot - Provide.ai

Most people seem obsessed with token generation speed, but isn’t prefill the real bottleneck? Am I missing something?

/u/wbulot / May 6, 2026

I read this sub every day and I keep seeing benchmarks and discussions focused almost entirely on tokens/s generation speed. Prompt processing speed barely gets mentioned. From my own experience running a bunch of different models on different GPUs for…

Author name: /u/wbulot

Most people seem obsessed with token generation speed, but isn’t prefill the real bottleneck? Am I missing something?