backend-agnostic tensor parallelism has been merged into llama.cpp

backend-agnostic tensor parallelism has been merged into llama.cpp

if you have more than one GPU - your models can now run much faster

-sm layer is the default behaviour, -sm tensor is the new thing to try

"backend-agnostic" means you don't need CUDA to enjoy this

submitted by /u/jacek2023
[link] [comments]

Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top