LocalLLaMA

LocalLLaMA

Gemma4 26B A4B runs easily on 16GB Macs

Typically, models in the 26B-class range are difficult to run on 16GB macs because any GPU acceleration requires the accelerated layers to sit entirely within wired memory. It's possible with aggressive quants (2 bits, or maybe a very lightweight I…

LocalLLaMA

What counts as RAG?

I have always considered the term RAG to be a hype term. to me Retrieval Augmented Generation just means the model retrieves the data, interprets it based on what you requested and responds with the data in context, meaning any agentic system that has …

LocalLLaMA

Gemma 4 small model comparison

I know that artificial analysis is not everyone's favorite benchmarking site but it's a bullet point. I was particularly interested in how well Gemma 4 E4B performs against comparable models for hallucination rate and intelligence/output …

Scroll to Top