Server Specs:
16 Gigs DDR5
AMD Ryzen 5 7600X 4.7 GHz 6-Core Processor
AMD Radeon Sapphire Nitro+ 7900XTX
NZXT N7 B650E ATX AM5 Motherboard
Performance:
I'm running Qwen27b Q4 at 80k context on a Sapphire Nitro+ Radeon 7900XTX 24Gb at 40 t/s. My setup is Llama.cpp + Vulcan.
Question:
I've been having a blast with it, but it's time for some extra power under the hood. The return rate is just slow enough to be annoying with tooling, and the context window is just short enough to not handle low-end big tasks.
In a perfect world I'm running 120-140 Context at 60t/s. Hardware upgrades aside, what are some software changes that you guys have found that work?
[link] [comments]