DeepSeek released the full V4 technical paper this week, detailing FP4 quantization-aware training that reduces inference compute to 10-27% of V3.2 baseline while preserving 99.7% quality. The paper also documents novel training stability mechanisms for trillion-parameter MoE models, including anticipatory routing and SwiGLU clamping, plus a generative reward model approach that unifies inference and evaluation.
Why it matters: FP4 QAT with minimal quality loss could significantly reshape the economics of model training and deployment, particularly for multi-agent systems that spawn numerous model calls.