Gemma 4 has a systemic attention failure. Here’s the proof.

I've spent months building a diagnostic method for large language models. It catches what standard benchmarks miss - distributional collapse inside tensors, not just loss or perplexity.

Gemma 4 26B A4B fails it.

I analyzed Gemma 4 26B A4B (Q8_0) quant from Unsloth. Found 29 tensors with distribution drift. 21 of them are attention layers.

Full log: https://pastebin.com/7SDqaMqA

29 tensors with KL(Kullback-Leibler)-drift.
21 of them are attention layers (attn_k, attn_q, attn_v).

Samples

Tensor KL Before KL After
blk.8.attn_k 0.2201 0.0006
blk.17.attn_q 0.1672 0.0001
blk.23.attn_q 0.1672 0.0001
blk.19.attn_k 0.0975 0.0001
blk.12.attn_k 0.0890 0.0006
blk.22.attn_k 0.0879 0.0004
blk.28.attn_k 0.0791 0.0007
blk.8.attn_q 0.0530 0.0002
blk.6.attn_k 0.0490 0.0001
blk.15.attn_q 0.0482 0.0003
blk.1.attn_k 0.0474 0.0006

Normal range: below 0.02. These were 2x to 10x above.

Gemma 4 attention mechanism has systemic drift. The model was released broken.

submitted by /u/EvilEnginer
[link] [comments]

Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top