OpenAI Releases LifeSciBench to Evaluate AI on Real Research Tasks, Comprising 750 Expert-Written Items Across 7 Biology Fields

According to OpenAI's official announcement on June 20, the company released LifeSciBench, a new evaluation benchmark designed to assess AI systems on real-world scientific research tasks. The benchmark comprises 750 expert-written tasks spanning 7 research workflows and 7 biology domains, created by 173 Ph.D.-level researchers with biotech or pharmaceutical industry experience.

Over 79% of tasks require multi-step reasoning, averaging approximately 4 reasoning steps per question. The benchmark includes 1,062 real research data attachments such as papers, charts, sequence data, and structural files, emphasizing complex research capabilities including evidence integration, experimental design, data analysis, scientific reasoning, and research communication rather than simple factual questions.

Disclaimer: The information on this page may come from third-party sources and is for reference only. It does not represent the views or opinions of Gate and does not constitute any financial, investment, or legal advice. Virtual asset trading involves high risk. Please do not rely solely on the information on this page when making decisions. For details, see the Disclaimer.
Comment
0/400
No comments