LocalLLaMA

How do I get the superfast DFlash / MTP tokens per second that I’m seeing on here? Dual 3090s

I'm trying to get these high tokens per second that I'm seeing on here using the new speculative decoding techniques. Hardware: 2×3090, AMD 9900X, 32GB RAM, Gigabyte B850 AI TOP. Running Ubuntu 24.04, CUDA 13.0, NVIDIA-SMI 580.105.08 I'm r…