According to Beating, MiniMax released its M2 technical report on arXiv, detailing its flagship MoE (mixture-of-experts) architecture and Agent training system Forge. The company disclosed how Forge optimizes long-context Agent reinforcement learning through windowed FIFO scheduling and prefix-tree merging techniques, achieving up to 40x training speedup.
M2.7 demonstrated autonomous agent self-evolution capabilities, completing over 100 rounds of analysis, code revision, and testing cycles. On performance benchmarks, M2.7 reached 56.22% on SWE-Pro and 52.7% on Multi-SWE-bench, with a 66.6% average reward rate on MLE Bench, approaching Gemini 3.1 performance levels.