OpenAI Releases LifeSciBench with 750 Expert Tasks to Evaluate AI in Real Scientific Workflows

According to Odaily, OpenAI released LifeSciBench, a new evaluation benchmark comprising 750 expert-written tasks spanning 7 scientific research workflows and 7 biology domains. The benchmark was developed by 173 researchers with PhDs and experience in biotech or pharmaceutical industries.

Over 79% of tasks require multi-step reasoning, averaging 4 reasoning steps per task, with 1,062 real scientific data attachments including papers, charts, sequence data, and structural files. The benchmark assesses complex research capabilities such as evidence integration, experimental design, data analysis, scientific reasoning, and research communication.

Disclaimer: The information on this page may come from third-party sources and is for reference only. It does not represent the views or opinions of Gate and does not constitute any financial, investment, or legal advice. Virtual asset trading involves high risk. Please do not rely solely on the information on this page when making decisions. For details, see the Disclaimer.
Comment
0/400
No comments