LocalLLaMA

FPGAs for speculative decoding

Anyone who knows stuff about fpgas: – What max model size can one be designed for (I've read 20-30m parameters max, is it possible to go for more if quantized – at a resonable price)? – Taalas – is what they're doing with asics more viable (rum…