Gemma 4 has a systemic attention failure. Here’s the proof.

I've spent months building a diagnostic method for large language models. It catches what standard benchmarks miss - distributional collapse inside tensors, not just loss or perplexity.

Gemma 4 26B A4B fails it.

I analyzed Gemma 4 26B A4B (Q8_0) quant from Unsloth. Found 29 tensors with distribution drift. 21 of them are attention layers.

Full log: https://pastebin.com/7SDqaMqA

29 tensors with KL(Kullback-Leibler)-drift.
21 of them are attention layers (attn_k, attn_q, attn_v).

Samples

Tensor	KL Before	KL After
blk.8.attn_k	0.2201	0.0006
blk.17.attn_q	0.1672	0.0001
blk.23.attn_q	0.1672	0.0001
blk.19.attn_k	0.0975	0.0001
blk.12.attn_k	0.0890	0.0006
blk.22.attn_k	0.0879	0.0004
blk.28.attn_k	0.0791	0.0007
blk.8.attn_q	0.0530	0.0002
blk.6.attn_k	0.0490	0.0001
blk.15.attn_q	0.0482	0.0003
blk.1.attn_k	0.0474	0.0006

Normal range: below 0.02. These were 2x to 10x above.

Gemma 4 attention mechanism has systemic drift. The model was released broken.

submitted by /u/EvilEnginer
[link] [comments]

Leave a Comment