Hello everyone. During data debugging session on per tensor and per neuron level I found that neurons in tensor layers in MoE model can die (have zero value). Here the log.
For example In blk.0.ffn_gate_exps.weight and blk.0.ffn_up_exps.weight in Qwen3.6 35B A3B Q8_0 quant:
I found 40% of zero neurons.
In Qwen3.5 9B I didn't found any zero blocks. All blocks in it contain value.
Don't know why this is happened. I never trained LLM's by myself, but this problem exists. A company I'm interviewing with independently confirmed these findings using different detection methods. But I think this is the main reason why LLM degrade during training.
I fixed the model as much I can on Google Collab Free Tier CPU on binary level. And restored dead neurons (7.5 million zero blocks in Q8 quant) in tensors via copy/pasting binary weight data from healthy neighbour neurons to dead neurons + linear interpolation.
Here fixed GGUF model: https://huggingface.co/LuffyTheFox/Qwen3.6-35B-A3B-Plus-Uncensored-Wasserstein-GGUF
And benchmark from user: https://huggingface.co/LuffyTheFox/Qwen3.6-35B-A3B-Plus-Uncensored-Wasserstein-GGUF/discussions/1#69e772a7b01172a7d35fb655
And .safetensors fp8_e4m3fn version: https://huggingface.co/LuffyTheFox/Qwen3.6-35B-A3B-Plus-Uncensored-Wasserstein-Safetensors
I converted Q8_0 to .safetensors via this script: https://huggingface.co/LuffyTheFox/Qwen3.6-35B-A3B-Plus-Uncensored-Wasserstein-Safetensors/raw/main/gguf_to_safetensors.py
FP8 version uncensored version in .safetensors is trainable - gradients are alive in it without zeros.
Model is based on this one: https://huggingface.co/HauhauCS/Qwen3.6-35B-A3B-Uncensored-HauhauCS-Aggressive . Thanks to HauhauCS for amazing job.
System prompt: https://pastebin.com/pU25DVnB
Chat template: https://pastebin.com/Dy2fmmpN
Recommended quants: MXFP4_MOE and Q8_0
Recommended Settings (LM Studio):
| Parameter | Value |
|---|---|
| Temperature | 0.7 |
| Top K Sampling | 20 |
| Presence Penalty | 1.5 |
| Repeat Penalty | Disabled |
| Top P Sampling | 0.8 |
| Min P Sampling | 0 |
| Seed | 42 |
Enjoy ^_^
PS: Qwen team released 3.6 27B version. I can't use it on my RTX 3060 12GB, but I will heal it for community and release after HauhauCS 27B uncensored release.
[link] [comments]