For the 5 people here running vLLM on multiple R9700s, you need to patch in support for AITER Unified Attention.
I have a 4 x R9700 system on Threadripper pro, but I have never been happy with the performance of my GPUs in vLLM. I have started benchmarking any new model I try out with llama-benchy so that I can get a better idea of how models of different sizes a…