Law Professors Prefer AI Answers Over Human Responses 75% of Time in Stanford Study

Stanford University researchers found that law professors preferred AI-generated contract law answers over those written by fellow professors approximately 75% of the time in a recent study. In 2,918 blinded comparisons, 16 professors from 14 U.S. law schools selected Google's Gemini 2.5 Pro responses 75.92% of the time and NotebookLM responses 74.75% of the time over human instructor answers. The study tested whether large language models could align with professional legal reasoning standards across legal doctrine, case law, hypotheticals, and policy issues, as law schools and courts increasingly integrate AI tools into legal practice.

Stanford Study Tests AI Against Law Professors on Contract Law Questions

The study involved 16 professors from 14 U.S. law schools, including Stanford, Yale, New York University, the University of Chicago, Georgetown, UCLA, and the University of Virginia. The professors created 40 contract law questions covering legal doctrine, case law, hypotheticals, and policy issues. Researchers designed the evaluation to test AI capabilities in domains requiring judgment rather than single correct answers.

"Large language models (LLMs) are increasingly promoted as educational tutors, yet most evaluations focus on domains with a single ground truth," the researchers wrote. "Many disciplines, however, hinge on judgment: reasoning, weighing ambiguity, and reaching defensible conclusions. Law provides a sharp test."

Professors evaluated answer pairs in blinded comparisons, selecting the response they would rather give a student without knowing whether the answer came from AI or a human instructor.

Gemini 2.5 Pro and NotebookLM Win 75% of Professor Comparisons

Google's Gemini 2.5 Pro won 75.92% of its matchups against human instructors, while NotebookLM won 74.75% of the time. The researchers analyzed whether the results reflected broader professional consensus by examining agreement rates when professors evaluated the same answer pairs.

"Observed agreement exceeded the level expected if judgments were entirely idiosyncratic, indicating that the LLMs' success reflects alignment with common disciplinary criteria," the researchers wrote.

AI models outperformed human instructors across multiple categories, including recall questions relating to case, code, or doctrine, hypotheticals, and policy discussions. The study tested whether AI advantages stemmed from surface-level writing style rather than substantive content by analyzing lexico-syntactic features such as answer length, structural organization, reasoning nuance, legal anchors, confidence tone, clarity, and pedagogical support.

In a separate analysis of additional models, Anthropic's Claude Opus 4.7 ranked first, followed by OpenAI's ChatGPT 5.4 and Gemini 2.5 Pro. Every AI model evaluated outperformed human instructors on average.

AI Models Record Lower Harmfulness Rates Than Human Instructors

AI-generated answers were flagged as harmful less often than those written by professors. Gemini recorded a 3.41% harmfulness rate and NotebookLM recorded 3.64%, compared with 12.06% for human instructors.

The researchers noted that the study did not measure whether answers matched each professor's individual teaching preferences. "While LLM responses are generally preferred over those of human instructors, our evaluation setting does not allow us to directly measure the extent to which instructor preferences are satisfied," the study stated. "It is at least theoretically possible that LLMs, although generally delivering stronger responses, still generate answers that are merely viewed as 'good enough.'"

Los Angeles Court and Law Schools Adopt AI Tools

The Los Angeles Superior Court began testing AI tools in March to help judges manage growing caseloads. Law schools are adding AI training programs as the legal profession integrates artificial intelligence.

"The potential benefits of these new technologies as a force multiplier in the practice of law just can't be ignored," Mississippi College School of Law Dean John P. Anderson told Decrypt. "Whether our students plan to be litigators or transactional attorneys, their future employers will expect familiarity with these AI tools. We want the firms hiring our students to be confident that every MC Law grad is competent in AI technologies."

Sullivan & Cromwell Admits Fake AI Citations in Bankruptcy Filing

Law firms continue to confront cases undermined by hallucinations and other AI-generated errors. In April, law firm Sullivan & Cromwell admitted to a U.S. bankruptcy court that a recent filing in a high-profile case contained fake citations generated by AI.

FAQ

What percentage of the time did law professors prefer AI-generated answers over human-written answers in the Stanford study?

Law professors preferred AI-generated answers approximately 75% of the time in the Stanford study. Google's Gemini 2.5 Pro won 75.92% of its matchups against human instructors, while NotebookLM won 74.75% of the time across 2,918 blinded comparisons.

How did AI harmfulness rates compare to human instructor responses in the study?

AI-generated answers recorded lower harmfulness rates than human instructor responses. Gemini had a 3.41% harmfulness rate and NotebookLM had a 3.64% rate, compared with 12.06% for human instructors.

What AI tools is the Los Angeles Superior Court testing?

The Los Angeles Superior Court began testing AI tools in March to help judges manage growing caseloads, though the specific tools were not identified in the source.

Disclaimer: The information on this page may come from third-party sources and is for reference only. It does not represent the views or opinions of Gate and does not constitute any financial, investment, or legal advice. Virtual asset trading involves high risk. Please do not rely solely on the information on this page when making decisions. For details, see the Disclaimer.
Comment
0/400
No comments