LocalLLaMA

Mistral Medium 3.5 128B and Qwen 3.5 122B A10B on 4x RTX 3080 20GB

Mistral Medium 3.5 128B with 4×3080 20GB with layer split: CUDA_VISIBLE_DEVICES=0,1,2,3 ./build/bin/llama-bench –model /data/huggingface/Mistral-Medium-3.5-GGUF/Mistral-Medium-3.5-128B-IQ4_XS-00001-of-00003. gguf -ngl 99 -d 0,16384 -fa 1 –split-mode …