Benchmarking the new b9200 update: Optimizing Qwen 3.6 27B mtp for Hermes Agent on a single RTX 3090
UPDATED (POST b9200) TL;DR If you're running rigid agent frameworks locally with mtp on consumer hardware: drop your draft window to 3, lock parallel slots to 1, and compile to b9200 or newer to get your memory bandwidth back. The numbers back it u…