Zyphra has introduced ZAYA1-8B, a mixture-of-experts reasoning model with only 700M active parameters that matches or exceeds DeepSeek-R1-0528 on mathematics and coding benchmarks despite being substantially smaller. The model was trained entirely on AMD infrastructure and incorporates a novel test-time compute method called Markovian RSA, which achieves 91.9% accuracy on AIME'25 and 89.6% on HMMT'25 while maintaining a compressed reasoning context of just 4K tokens.
Why it matters: This demonstrates that efficient, smaller reasoning models can achieve competitive performance with much larger systems, potentially reshaping the economics of AI inference and making advanced reasoning capabilities more accessible to resource-constrained deployments.