How many of you tried BeeLlama.cpp? How’s it? Agentic coding possible with 8GB VRAM?

We'll be getting those features(check bottom link) on mainline soon or later anyway. But for now this fork could be useful to see the full potential of our poor GPUs(and also big, large GPUs).

Any 8GB VRAM(and 32GB RAM) folks already doing Agentic coding with models(@ Q4 at least) like Qwen3.6-35B-A3B, Qwen3.6-27B, Gemma-4-31B, Gemma-4-26B-A4B? I would love to see some t/s stats, full commands & more details on that. I'm not expecting any miracle with 8GB VRAM, still want to do something decent with limited constraints. Though I'm getting new rig this month, I want to use my current laptop(8GB VRAM) too for Agentic coding.

Others(who has more than 8GB VRAM), please share your stats, full commands & comparison with mainline.

Below is related thread by creator. Hope the creator adds more features continuously.

BeeLlama.cpp: advanced DFlash & TurboQuant with support of reasoning and vision. Qwen 3.6 27B Q5 with 200k context on 3090, 2-3x faster than baseline (peak 135 tps!)

submitted by /u/pmttyji
[link] [comments]

Leave a Comment