backend-agnostic tensor parallelism has been merged into llama.cpp

By /u/jacek2023 / April 9, 2026

backend-agnostic tensor parallelism has been merged into llama.cpp

if you have more than one GPU - your models can now run much faster

-sm layer is the default behaviour, -sm tensor is the new thing to try

"backend-agnostic" means you don't need CUDA to enjoy this

submitted by /u/jacek2023
[link] [comments]

Leave a Comment