Z-Lab did some good work with speeding up output, while Luce managed to use smaller models of the same family to accelerate prefill... Since Heretic and other "smart ablation" tools can decensor a model, would they work with these multi-model speedup methods?
P.S. Wish more people can get on the PFlash bandwagon since both Qwen3.6 and Gemma 4 have smaller models. 5-10x speedup seems ludicrous
[link] [comments]