LocalLLaMA

[Release] Carnice-9b-W8A16-AWQ – AWQ Quantization Optimized for vLLM + Marlin on Ampere GPUs (Single-GPU)

Hey r/LocalLLaMA, I am releasing my first model quantization: an 8-bit symmetric AWQ (W8A16) of kai-os/Carnice-9b, specifically optimized for Ampere GPUs (RTX 30-series) using vLLM with the Marlin kernel on a single-GPU inference setup. kai-os/Carnice-…