LocalLLaMA

Most people seem obsessed with token generation speed, but isn’t prefill the real bottleneck? Am I missing something?

I read this sub every day and I keep seeing benchmarks and discussions focused almost entirely on tokens/s generation speed. Prompt processing speed barely gets mentioned. From my own experience running a bunch of different models on different GPUs for…