LocalLLaMA

Will llama.cpp multislot improve speed?

I've heard mostly bad opinions about multiple slots with llama.cpp (–parallel > 1). I guess comparing to vLLM it might be worse at this, but I recently tried vLLM on 4 slots and it indeed improved the overall speed significantly (150-170tps dec…