As per title, assuming you run both with the same context and quantization in llama.cpp is there any difference in vram usage?
[link] [comments]
As per title, assuming you run both with the same context and quantization in llama.cpp is there any difference in vram usage?