I did give same prompt same document to 1660ti running Gemma 4 e2b q4 coz of the small vram and another to and igpu running Gemma 4 e4b q8 prefill rate before token generation was like 4-5 times faster with the 890m igpu then token generation 1660ti was like 20t/s then 890m 9t/s both using lmstudio both on kde 26.04 lts
Note the parity in the model size and quantization both running on 130,000 full tokens because the work was huge .. so is amd really slow according to these many benchmarks am seeing?
[link] [comments]