What solutions are you using to boost TPS and Context Window?

Server Specs:

16 Gigs DDR5

AMD Ryzen 5 7600X 4.7 GHz 6-Core Processor

AMD Radeon Sapphire Nitro+ 7900XTX

NZXT N7 B650E ATX AM5 Motherboard

Performance:

I'm running Qwen27b Q4 at 80k context on a Sapphire Nitro+ Radeon 7900XTX 24Gb at 40 t/s. My setup is Llama.cpp + Vulcan.

Question:

I've been having a blast with it, but it's time for some extra power under the hood. The return rate is just slow enough to be annoying with tooling, and the context window is just short enough to not handle low-end big tasks.

In a perfect world I'm running 120-140 Context at 60t/s. Hardware upgrades aside, what are some software changes that you guys have found that work?

submitted by /u/NetTechMan
[link] [comments]

Leave a Comment