| The Context I’ve been following this thread for Qwen 3.5 by u/EvilEnginer, claiming a 90% error reduction by scaling specific ssm_conv1d.weight tensors. My Testing I’m interested in seeing if we can confirm their results and make this fix a standard, transparent utility for the community. Based on the findings shared by u/EvilEnginer regarding tensor scales in the final blocks, I’ve written an independent tool to automate the detection and repair of this drift. However, my initial testing is inconclusive: - NIAH (Needle In A Haystack) @ 125k context: Both the original BF16 and my repaired version passed with identical scores. I didn't see the context "melt-down" described in the original thread, which suggests this fix might target a more specific failure mode (like logic loops or code generation) that NIAH doesn't catch. The Tool & Call for Collaboration I’ve automated the detection (using Median Absolute Deviation Z-scores) and the repair logic. I’d love to see if the community can help confirm u/EvilEnginer’s findings and help refine this so we have a reliable, open-source way to apply these repairs. As I don’t have the horsepower I am hoping we can do some:
[link] [comments] |