
OpenEvidence AI scores 100% on USMLE as company launches free explanation model for medical students
https://www.fiercehealthcare.com/ai-and-machine-learning/openevidence-ai-scores-100-usmle-company-offers-free-explanation-model
Artificial intelligence startup OpenEvidence says its AI model has scored a perfect 100% on the United States Medical Licensing Examination (USMLE), raising the bar on the proficiency of AI models to interpret medical information.
The company spent the last six months evolving the core AI models and other technologies that power OpenEvidence and training a more advanced reasoning model, Daniel Nadler, Ph.D., founder of OpenEvidence, told Fierce Healthcare. The company’s AI models had scored a 90% on the USMLE two years ago.
“The models can actually reason step by step and do what I would call a second or third derivative reasoning, which means not just taking the fact that comes in before you, but taking the factors in before you, figuring out what those imply, and then reasoning through the implications,” he said, noting that OpenEvidence’s AI models have achieved “super high-grade medical reasoning.”
The USMLE is a three-step exam that is required for medical licensing in the U.S., with each step evaluating different aspects of whether a doctor has the knowledge and skills to provide safe and effective medical care.
Nadler asserts that what is key to this development is that OpenEvidence is offering a new AI system that not only accurately answers each question on the USMLE but also teaches the reasoning behind each answer.
The company is rolling out an explanation model that demonstrates the reasoning behind the correct answers as a free medical education resource. The models provide accurate references using gold-standard sources of medical knowledge such as The New England Journal of Medicine (NEJM) and the Journal of the American Medical Association (JAMA), the company said.
Nadler said the new AI explanation models and other tools will help “democratize” access to quality medical education resources and support.
OpenEvidence developed an AI-powered medical search engine and generative AI chatbot exclusively for doctors that summarizes and simplifies evidence-based medical information. Founded in 2022 by Nadler, the company touts that it’s the most widely used medical search engine among U.S. clinicians, claiming more than 40% of physicians in the U.S. use its platform.
The company offers its chatbot to physicians for free. It is actively used across more than 10,000 hospitals and medical centers nationwide, and it continues to grow by more than 65,000 new verified U.S. clinician registrations each month, the company claims.
OpenEvidence has formed strategic content partnerships with the American Medical Association, the NEJM, the JAMA and all 11 JAMA specialty journals including JAMA Oncology and JAMA Neurology. The startup has raised more than $300 million since its founding, including a $210 million series B raise last month, at a $3.5 billion valuation.
The new AI explanation model will be available for free for physicians with a national provider identifier and professionals with a medical education number, Nadler said.
These tools were built “with a focus on education, creating vignette and case-based learning customizable by training level with reasoning and explanations grounded in the current medical literature,” according to the company. It demonstrates OpenEvidence’s commitment and continued effort to improve physician knowledge at all levels of medical education.
“There’s an enormous amount of inequality in medical education in the United States and in preparation for medical school exams,” Nadler noted, as the cost of medical school education continues to rise.
With the launch of the free medical education tools, Nadler says he is coming full circle as he worked at test prep company Kaplan while he was in school.
“Those who could pay for very expensive high-grade test prep tended to do better on the test because they had people like me walking them through the explanations of the answer,” he said. “I’ve seen the direct value of seeing an explanation for an answer. An explanation of why an answer is the correct answer is so critical in helping someone prepare and study for a test and to learn the body of knowledge. The point is not just to pass the test. The point is to learn it.”
With advancements in AI and its increasing use in healthcare, AI companies and researchers have been evaluating the performance of AI models on medical exams to test their proficiency with medical knowledge and interpretation. AI’s performance on medical exams has significantly improved over time. In late 2022, a study found that OpenAI’s ChatGPT was able to score at or close to the 60% passing grade needed for the USMLE. Later that year, researchers evaluated ChatGPT-4 on USMLE Step 1-style questions, and it answered 86% of the 1,300 questions accurately.
OpenAI’s latest model, ChatGPT-5, scored a 97%, according to OpenEvidence’s evaluation, Nadler said.
Back in April, biomedical informatics researchers at the University of Buffalo said a clinical AI tool they developed demonstrated improved accuracy on all three parts of the USMLE. The tool, called Semantic Clinical Artificial Intelligence (SCAI), scored as high as 95.1% on Step 3 of the USMLE, notably outperforming GPT-4 Omni, which scored 90.5% on the same test, according to a paper published in JAMA Network Open.
However, testing AI on medical exams is just one benchmark. Many industry experts say argue that many of these evaluations focus heavily on question-answer tests and not enough on evaluating real-world medical tasks.
Microsoft’s AI team recently released a paper that evaluated how well AI diagnosed medically complex cases from the NEJM as compared to doctors. That paper looked at Microsoft’s AI-enabled diagnostic system, called the Microsoft AI Diagnostic Orchestrator (MAI-DxO), and found it can accurately diagnose up to 85% of complex medical cases, a rate more than four times higher than a group of experienced physicians.
Nadler agrees that a USMLE score is not the best way to evaluate OpenEvidence in a clinical context.
“I think this is just strictly applicable to helping future doctors really prepare for and understand the answers to the USMLE,” he said. “What we’re actually much more proud of is the ability to reason out and explain step by step why something is the right answer. I think there’s a lot of value there. This, to me, is much closer to medical intelligence.”
Health tech and AI companies are racing to expand their footprints in healthcare and to be the go-to tools for doctors in medical decision support. Just last week, Doximity bought Pathway Medical for $63 million to bulk up its healthcare AI capabilities as it looks to offer more free AI tools to doctors. Pathway claims it has one of the largest structured data sets in medicine, “spanning nearly every guideline, drug and landmark trial across all major specialties.”
Earlier this year, Pathway said its AI models top the publicly reported leaderboard on the USMLE, achieving a 96 % accuracy rate. At that time, Pathway said it outperformed other medical‑AI systems such as GPT‑4, Med‑Gemini, OpenEvidence and Hippocratic AI.
Now, there’s a legal feud as OpenEvidence sued Pathway for trade secret theft earlier this year alleging the company “invaded the OpenEvidence AI platform repeatedly and executed dozens of ‘prompt injection’ attacks.”
OpenEvidence filed a similar suit against Doximity in June.
Like this:
Like Loading...
Filed under: General Problems | Leave a Comment »