Qwen3.6-35B-A3B-Plus-Uncensored-Wasserstein (neuron level surgery)

Hello everyone. During data debugging session on per tensor and per neuron level I found that neurons in tensor layers in MoE model can die (have zero value). Here the log.

For example In blk.0.ffn_gate_exps.weight and blk.0.ffn_up_exps.weight in Qwen3.6 35B A3B Q8_0 quant:

I found 40% of zero neurons.

In Qwen3.5 9B I didn't found any zero blocks. All blocks in it contain value.

Don't know why this is happened. I never trained LLM's by myself, but this problem exists. A company I'm interviewing with independently confirmed these findings using different detection methods. But I think this is the main reason why LLM degrade during training.

I fixed the model as much I can on Google Collab Free Tier CPU on binary level. And restored dead neurons (7.5 million zero blocks in Q8 quant) in tensors via copy/pasting binary weight data from healthy neighbour neurons to dead neurons + linear interpolation.

Here fixed GGUF model: https://huggingface.co/LuffyTheFox/Qwen3.6-35B-A3B-Plus-Uncensored-Wasserstein-GGUF

And benchmark from user: https://huggingface.co/LuffyTheFox/Qwen3.6-35B-A3B-Plus-Uncensored-Wasserstein-GGUF/discussions/1#69e772a7b01172a7d35fb655

And .safetensors fp8_e4m3fn version: https://huggingface.co/LuffyTheFox/Qwen3.6-35B-A3B-Plus-Uncensored-Wasserstein-Safetensors

I converted Q8_0 to .safetensors via this script: https://huggingface.co/LuffyTheFox/Qwen3.6-35B-A3B-Plus-Uncensored-Wasserstein-Safetensors/raw/main/gguf_to_safetensors.py

FP8 version uncensored version in .safetensors is trainable - gradients are alive in it without zeros.

Model is based on this one: https://huggingface.co/HauhauCS/Qwen3.6-35B-A3B-Uncensored-HauhauCS-Aggressive . Thanks to HauhauCS for amazing job.

System prompt: https://pastebin.com/pU25DVnB

Chat template: https://pastebin.com/Dy2fmmpN

Recommended quants: MXFP4_MOE and Q8_0

Recommended Settings (LM Studio):

Parameter	Value
Temperature	0.7
Top K Sampling	20
Presence Penalty	1.5
Repeat Penalty	Disabled
Top P Sampling	0.8
Min P Sampling	0
Seed	42

Enjoy ^_^

PS: Qwen team released 3.6 27B version. I can't use it on my RTX 3060 12GB, but I will heal it for community and release after HauhauCS 27B uncensored release.

submitted by /u/EvilEnginer
[link] [comments]

Leave a Comment