Note: OpenAI leads the general AI space, but AI companies are developing deep research tools and experimenting with AI-powered searching. Perhaps you have faculty or students asking you to present these tools to classes.
OpenEvidence is a popular, AI-powered medical information platform designed to assist healthcare professionals, particularly physicians, by providing quick evidence-based answers to clinical questions at point of care. Created within the Mayo Clinic Platform Accelerate program, it aggregates and synthesizes medical literature from trusted sources such as PubMed, New England Journal of Medicine, and JAMA, covering over 35 million peer-reviewed publications in PubMed. OE offers features such as real-time literature searches, summarizing responses with citations, and tools for administrative tasks such as writing prior authorization letters.
OpenEvidenceis free to use for verified U.S. healthcare professionals and used by over 40% of U.S. physicians daily, handling millions of clinical consultations monthly. It scored over 90% on the USMLE, outperforming other AI models such as ChatGPT in accuracy. OE is HIPAA-compliant but emphasizes it does not provide medical advice, and users should verify outputs against clinical expertise and bona fide library resources. Investors include Sequoia, Google Ventures, and Kleiner Perkins, with $210 million Series B raising valuation to $3.5 billion.
OpenEvidence emphasizes real-time clinical decision support and broader content partnerships, such as with Elsevier’s ClinicalKey AI. OpenEvidence outperforms other AI search tools by delivering concise, clinically focused summaries; however, clinicians should use these tools as adjuncts to, not replacements for, clinical expertise and comprehensive resources such as Medline and UpToDate.
Many (if not all) of the AI-powered search tools such as OpenEvidence use retrieval augmented generation (RAG) techniques to deliver results. RAG refers to a technique combining the strengths of retrieval-based and generative AI models. In RAG, an AI system first retrieves information from a large dataset or knowledge base and then uses this retrieved data to generate a response or output. Essentially, the RAG model augments the generation process with additional context or information pulled from relevant sources.
Presentations
Note: This presentation was selected by a librarian due to the presenter and their understanding of the product. As this is a marketing video and tutorial, some of the claims of the video should be tested and verified.
Who is behind Open Evidence?
Daniel Nadler, PhD: founder of OpenEvidence, a Sequoia-backed AI-powered medical information platform aimed at organizing and synthesizing medical knowledge. Nadler founded Kensho Technologies, acquired by S&P Global in 2018. Named to TIME100 Health list in 2025 for contributions to global health.
Zachary Ziegler: co-founder. Ziegler has a background in machine learning from a PhD program at Harvard and focuses on leveraging AI to aggregate clinical wisdom and support physicians in decision-making.
Dr. Travis Zack: mentioned in a Reddit AMA alongside Zachary Ziegler, indicating involvement in the platform’s development or advisory capacity.
Dr. Antonio J. Forte, MD, PhD: Director Mayo Clinic, noted for supporting OpenEvidence’s mission to reduce the time clinicians spend searching for medical information.
Dr. John J. Lee, MD: Listed on OpenEvidence’s website as part of the team, though specific roles are not detailed.
Supported by investors such as Sequoia Capital, Google Ventures (GV), Kleiner Perkins, Coatue, Thrive Capital, Conviction Partners, and Mayo Clinic, among others.
Bottom line: For health sciences librarians, OpenEvidence might support their work with health professionals. However, its underlying AI technologies raise concerns for those interested in scientific accuracy, transparency and rigour in performing point-of-care. I like to make a distinction between searching for sources and searching for answers; LLMs do tend to provide the second (answers, which may not be correct and/or accurate) while hiding the first (search sources & process). This is not optimal when supporting clinicians in their patient care and clinical decision-making. I would refer a physician to DynaMed or UpToDate for point-of-care information. Note information provided to you here changes, so check the tool's website for the most current information (or discuss with a librarian).
Michelle Kraft summed up the lack of access appropriately, "....So can I recommend OpenEvidence? I don’t know…and that’s exactly the problem. It’s the latest AI-powered darling of medicine, launched by Harvard-affiliated founders and backed by $210M in funding. It’s free for verified U.S. physicians and medical professionals with an NPI, monetized through advertising, and praised for saving doctors time. But as a medical librarian, someone trained in evidence evaluation and information retrieval, I’m locked out. No NPI, no access. That means I can’t assess its sources, search precision, transparency, or even help clinicians connect it to the full text of the citations. When the very people who specialize in evaluating medical information are excluded, it raises concerns. Until more voices from the information side of healthcare are included and kick the AI’s tires, it’s hard to fully know if OpenEvidence is smart medicine….or just smart marketing?"
This study evaluates trustworthiness and readability of AI chatbot responses to questions about operative care for MDO. Study was conducted using ChatGPT, Google Gemini, Microsoft Copilot, and Open Evidence. Twenty common questions were developed. Authors used modified DISCERN tool to assess quality and SMOG (Simple Measure of Gobbledygook) test to evaluate response readability. Modified DISCERN analysis revealed clear aims and relevancy scored highest (mean=4.92 SD=0.31; mean=4.64, SD=0.62). Additional sources provided and citation of sources had lowest means (mean=2.19 SD=1.52; mean=2.93 SD=1.96). Microsoft Copilot scored highest in overall quality (mean=38.10 versus ChatGPT=29.90, P<0.001). Open Evidence scored lowest in shared decision-making (mean=1.80 SD=1.10). Differences in readability across all AI models were found (mean=17.31 SD=3.59, P<0.001), indicating average response at a graduate school reading level. Open Evidence (mean=22.24) produced higher SMOG reading scores than ChatGPT (mean=15.89), Google Gemini (mean=15.66), and Microsoft Copilot (mean=15.44) (P<0.001). Findings highlight a need for reviewing reliability of AI chatbots.
Note: Please use your critical reading skills while reading entries. No warranties, implied or actual, are granted for any health or medical search or AI information obtained while using these pages. Check with your librarian for more contextual, accurate information.