I've spent months building a diagnostic method for large language models. It catches what standard benchmarks miss - distributional collapse inside tensors, not just loss or perplexity.
Gemma 4 26B A4B fails it.
I analyzed Gemma 4 26B A4B (Q8_0) quant from Unsloth. Found 29 tensors with distribution drift. 21 of them are attention layers.
Full log: https://pastebin.com/7SDqaMqA
29 tensors with KL(Kullback-Leibler)-drift.
21 of them are attention layers (attn_k, attn_q, attn_v).
Samples
| Tensor | KL Before | KL After |
|---|---|---|
| blk.8.attn_k | 0.2201 | 0.0006 |
| blk.17.attn_q | 0.1672 | 0.0001 |
| blk.23.attn_q | 0.1672 | 0.0001 |
| blk.19.attn_k | 0.0975 | 0.0001 |
| blk.12.attn_k | 0.0890 | 0.0006 |
| blk.22.attn_k | 0.0879 | 0.0004 |
| blk.28.attn_k | 0.0791 | 0.0007 |
| blk.8.attn_q | 0.0530 | 0.0002 |
| blk.6.attn_k | 0.0490 | 0.0001 |
| blk.15.attn_q | 0.0482 | 0.0003 |
| blk.1.attn_k | 0.0474 | 0.0006 |
Normal range: below 0.02. These were 2x to 10x above.
Gemma 4 attention mechanism has systemic drift. The model was released broken.
[link] [comments]