Help running Qwen3-Coder-Next TurboQuant (TQ3) model

I found a TQ3-quantized version of Qwen3-Coder-Next here:
https://huggingface.co/edwardyoon79/Qwen3-Coder-Next-TQ3_0

According to the page, this model requires a compatible inference engine that supports TurboQuant. It also provides a command, but it doesn’t clearly specify which version or fork of llama.cpp should be used (or maybe I missed it).llama-server

I’ve tried the following llama.cpp forks that claim to support TQ3, but none of them worked for me:

https://github.com/TheTom/llama-cpp-turboquant
https://github.com/turbo-tan/llama.cpp-tq3
https://github.com/drdotdot/llama.cpp-turbo3-tq3

If anyone has successfully run this model, I’d really appreciate it if you could share how you did it.

submitted by /u/UnluckyTeam3478
[link] [comments]

Leave a Comment