Ran some Llama.cpp RPC test to see if its worth it. And if 10Gbe needed.

Let me first say I am not doing anything with parallelism so these benchmarks and tests are not for you.

That said if your hobbyist like me that is left wondering if can I use the GPUs my other PCs then I have some answers and but I'm still learning. There is probably a better config for Llama.cpp but haven't see any huge gains, in fact flash attention seems to slow things down a bit so I didn't test with on. Also I'm sure if someone has better than consumer level networking they could get their latency down more which should improve things. I just don't have that kind of hardware.

On my main AI PC (see gpu details below) as the main for these tests. The 2nd PC has a 5070 and 3080 I tested this PC on WIndows 11, WSL, and Native Linux. And for fun one go around with a 3rd PC with a 5060ti 16gb. Here is the results.

I did double check to be sure the RPC server was in fact being used on each run.

Start off with the main PC only as a control to see how RPC does work. You can see my config and hardware used. For some reason I didn't need to rearrange my gpu order for the llama.bench to work good. All my test this PC is the main and is running Linux Mint with Nvidia driver 590.48.0.1 with Cuda toolkit 13.1 on a 2.5gbe connection.

Edit; In case people don't want to math. 120GB of Vram on main, 22GB on 2nd PC, and 16GB on 3rd PC.

Control

This is the 2nd PC is running native Linux on 2.5gbe connection.

2nd PC is running 5070 & 3080

Next is the same setup but with a 1gbe connection.

https://preview.redd.it/o877jcagxd0h1.png?width=1268&format=png&auto=webp&s=f8298f9d0faa4653e200c70fcbc715a051e5619a

Windows 11 595 Cuda toolkit 13.1 2.5gbe connection..

2nd PC is running 5070 & 3080

WSL with Nvidia 595, Cuda toolkit 13.1. 2.5gbe connection

5070 & 3080

Same as above but used a 1gbe connection.

https://preview.redd.it/vhl1ujsvyd0h1.png?width=1246&format=png&auto=webp&s=fdb0d6f52f7010a3434497972effe94561119323

Sill using WSL, back on 2.5gbe but using only the 3080

3080 only

Same specs but only the 5070 this time around.

5070 only

Same as above but on a 1gbe connection.

5070 only - 1gbe connection

Finally thought I would throw a 3rd PC into the mix. The 2nd PC is running both gpus in native Linux for this test. The 3rd PC is running Windows 11 with a 5060ti 16gb on a 2.5gbe connection.

https://preview.redd.it/xcdbzm1szd0h1.png?width=1278&format=png&auto=webp&s=c8d8f79a7c5fcc3e535c03379a555c8dd4090e6e

I don't know if the Windows issue is because the 3080 is running as the primary for Windows. But I've had a lot of weird issues with Windows. The main take away after testing is RPC is quite viable at least with a smaller context and a lot better when both running Linux. I'm waiting for some parts so I can add the 5060ti to the 2nd PC for larger context and I'm curious how it might scale up from here.

Oh and on a side note I did have an issue with Linux because it installed a generic network driver. I was getting pings around 1.5-3ms but this was fixed before the tests.

submitted by /u/lemondrops9
[link] [comments]

Leave a Comment