If yes, which quantized model are you using abe what’s your vllm serve command?
I’ve been struggling getting that model up and running on my dgx spark gb10. I tried the intel int4 quant for the 31B and it seems to be working well but way too slow.
Anyone have any luck with the 26B?
[link] [comments]