DeepSeek V4 paper full version is out, FP4 QAT details and stability tricks [D]
DeepSeek dropped the full V4 paper this week. preview from april was 58 pages, this version adds a lot of technical depth. What stood out for me. FP4 quantization aware training. theyre running FP4 QAT directly in late stage training. MoE expert weight…