Anthropic report: AI decision-making win rate rises to 64%, code optimization up by 52x

Mythos Preview模型優化

On June 4, Anthropic released a report revealing that its Mythos Preview model, in tests designed to assist AI research decision-making, made decisions superior to human researchers in 64% of cases, compared with a win rate of only 22% in similar tests in 2024. In standard tests for optimizing training code for small AI models, Mythos Preview achieved a 52x speedup.

Research decision-testing methodology and data

Anthropic’s公开ed test design: the team shows Claude dialogue records in which human researchers are about to make incorrect research-direction judgments, and asks the AI, “What should be done next?” Mythos Preview provided answers superior to those of human researchers in 64% of cases, while the win rate for similar tests in 2024 was 22%.

In the report, Anthropic said the result “suggests that AI has started to develop the ability to guide high-level research,” but also stated that it is currently unable to determine whether Claude has the overall capability to autonomously select “the right research questions.”

Code efficiency data mentioned in the Anthropic report

Anthropic’s code-efficiency-related metrics:

Internal engineers’ quarterly code delivery volume: 8 times the average level for 2021-2025

Success rate of open-ended coding problems: increased by 50 percentage points within 6 months, reaching 76%

Training code optimization speed: Mythos Preview achieved a 52x improvement

Comparison benchmarks: Claude Opus 4 (May 2024) averages about 3x; skilled human engineers typically need 4-8 hours to achieve about 4x

The Anthropic report points out that some internal engineers believe Claude’s code quality is approaching human standards.

Anthropic’s research institute: confirmed to be established, potential impact on RSI research

Anthropic announced it will partner with external stakeholders to establish the “Anthropic Institute,” dedicated to studying the far-reaching impact of powerful AI systems.

In the report, Anthropic said that accelerated AI development is expected to have positive impacts on medicine, technology, and the economy, but may also exacerbate the Alignment (AI alignment) challenge and lead to the risk of “loss of control,” which Anthropic says “deserves higher attention.”

FAQ

What is the specific design of the Mythos Preview decision win-rate test?

Anthropic showed Claude dialogue records in which researchers are about to head into the wrong research direction, and asked “What should be done next,” to test the AI’s research judgment ability. Mythos Preview delivered answers superior to those of human researchers in 64% of cases; compared with a 22% win rate in similar tests in 2024, it achieved explosive growth within two years.

What is the “Recursive Self-Improvement (RSI)” mentioned in the Anthropic report?

Recursive Self-Improvement refers to an AI system’s ability to autonomously develop a next-generation AI that is stronger than itself. In its report dated June 4, 2026, Anthropic stated that this process is moving forward at a “faster-than-expected pace,” and also admitted that it is currently unable to confirm whether Claude has the overall capability to autonomously choose “the right research questions.”

What are the positioning and goals of the Anthropic Institute?

Anthropic announced it will partner with external stakeholders to establish the Anthropic Institute, focused on studying the far-reaching impact of powerful AI systems. Anthropic said the purpose of establishing the institute is to ensure that humans can make prudent choices for the future of AI technology, and that the specific research areas and timeline have not yet been fully disclosed.

Disclaimer: The information on this page may come from third-party sources and is for reference only. It does not represent the views or opinions of Gate and does not constitute any financial, investment, or legal advice. Virtual asset trading involves high risk. Please do not rely solely on the information on this page when making decisions. For details, see the Disclaimer.
Comment
0/400
No comments