/u/Imakerocketengine

[Release] Carnice-9b-W8A16-AWQ – AWQ Quantization Optimized for vLLM + Marlin on Ampere GPUs (Single-GPU)

/u/Imakerocketengine / April 12, 2026

Hey r/LocalLLaMA, I am releasing my first model quantization: an 8-bit symmetric AWQ (W8A16) of kai-os/Carnice-9b, specifically optimized for Ampere GPUs (RTX 30-series) using vLLM with the Marlin kernel on a single-GPU inference setup. kai-os/Carnice-…

Author name: /u/Imakerocketengine

[Release] Carnice-9b-W8A16-AWQ – AWQ Quantization Optimized for vLLM + Marlin on Ampere GPUs (Single-GPU)