A note of warning about DFlash.
It started saying 4/5x speed advantage against usual bf16 models (test are less optimistic but let think this is true). Then MoE gain is not that good, value was for dense models. Then quantization greatly reduces the gain, Q8_0 still gains, Q4_0 not …