According to Odaily, OpenAI released LifeSciBench, a new evaluation benchmark comprising 750 expert-written tasks spanning 7 scientific research workflows and 7 biology domains. The benchmark was developed by 173 researchers with PhDs and experience in biotech or pharmaceutical industries.
Over 79% of tasks require multi-step reasoning, averaging 4 reasoning steps per task, with 1,062 real scientific data attachments including papers, charts, sequence data, and structural files. The benchmark assesses complex research capabilities such as evidence integration, experimental design, data analysis, scientific reasoning, and research communication.