FernflowerAI-35B-A3B-KL-ReLU-GGUF + Apple MLX

Qwen 3.5 35B A3B Uncensored HauhauCS (repaired) -> (now with KL + ReLU calibration)

Model available here: https://huggingface.co/LuffyTheFox/FernflowerAI-35B-A3B-KL-ReLU-GGUF

Repair summary: link

Extra information about how Qwen 3.5 35B got broken (and how I fixed it): link

V1 Apple MLX version (thanks to froggeric): https://huggingface.co/froggeric/Qwen3.5-35B-A3B-Uncensored-FernflowerAI-MLX-8bit

V2 Apple MLX version (final release): coming soon discussion here

History:
Hello everyone. A few days ago I released a fixed version of Qwen 3.5 35B A3B uncensored by HauhauCS - two broken tensors that Alibaba shipped with Qwen 3.5 35B A3B model, due to heavy complexity and bug during training process in AdamW optimizer ssm_conv1d.weight in blocks 36-37 were scaled back to normal. That fixed the major context collapse and looping. But after more testing, I found that some other tensors (experts, attention projections) had a subtler problem. Their overall scale and saturation looked fine, but the shape of their weight distribution was drifting away from the peer group. C1 and C2 didn't catch this. C3 (KL divergence) did.

So I added two more criteria to the diagnostic pass:

KL divergence - restores the distribution shape of tensors that drifted from their peer group without changing scale or saturation.
ReLU asymmetry - detects mean drift that AdamW can accumulate over time (didn't fire on this model, but the probe is there for others).

Results on this version:

Metric	Before	After
KL divergence (average)	0.1036	0.0297
KL reduction	—	71.3%
Repaired tensors (C2 + C3)	2	11

What this means for you:

The model was already stable after v1. Now it's tighter - fewer hidden distribution anomalies that could cause weird behavior on very long or complex tasks.
No new problems introduced. The 489 healthy tensors were left untouched.

Upgraded system prompt that unlocks deep thinking (works great with this model):
https://pastebin.com/pU25DVnB

Also you can use only one string in System Prompt. And add anything you want after it:
You are Qwen, created by Alibaba Cloud. You are a helpful assistant.

Quantization script available here: https://pastebin.com/hXhcMJn9

Chat template: https://pastebin.com/uk9ZkxCR (supports tool calling)

Recommended Settings (LM Studio):

Temperature	0.7
Top K Sampling	20
Presence Penalty	1.5
Repeat Penalty	Disabled or 1.0
Top P Sampling	0.8
Min P Sampling	0
Seed	3407

Enjoy ^_^

submitted by /u/EvilEnginer
[link] [comments]

Leave a Comment