Note: OpenAI leads the general AI space, but AI companies are developing deep research tools and experimenting with AI-powered academic searching in support of research. Perhaps you have faculty or students asking you to present these tools to classes. For more information, see Which companies are behind AI search tools?.
This entry tracks the burgeoning evidence of harms due to rapid and unethical adoption of artificial intelligence (AI) in healthcare. Mitigation strategies and putting patient safety "first do no harm" principles are key in minimizing harms and risks in the clinical setting.
As AI is integrated further into research and clinical care worldwide, there is enormous serious potential for increased harms to patients and health care workers. Real-world incidents have already set off alarm bells, and illustrate a range of risks such as misdiagnoses, misleading recommendations to patients, confidentiality and privacy breaches, and the erosion of clinician expertise.
AI-based medical search tools are often opaque (“black box”), poorly regulated, and create vulnerabilities when searching errors and hallucinations occur. The implementation of AI tools in medicine requires the utmost in ethical vigilance, robust governance, transparency, and active safeguards to truly “first, do no harm.”
AI Patient Safety refers to a newer field of research and practices aimed at ensuring AI systems operate in ways that are safe, reliable, and aligned with human values in health care. AI safety focuses on mitigating risk such as unintended consequences, misuse, bias, or loss of control, particularly as AI becomes more advanced and autonomous. The goal is to prevent harm to humans, society, or the environment. This involves technical measures (like robust design and error correction), ethical considerations, and governance frameworks.
Protecting patients and clinicians from AI-related harm requires a culture of caution, where systems are introduced gradually, tested thoroughly, and monitored continuously.
Patient and clinician safety in this context depends on clear accountability, reliable data practices, and empowering users to question AI outputs. Preventing harm means designing technology that strengthens—not replaces—sound medical judgment.
Recent research into harms and AI
Recent research reveals ways AI can introduce risks into healthcare contexts.
Denecke et al. (2025) document real-world cases where AI deployment led to unexpected patient harm, illustrating failures such as algorithmic misclassification and workflow disruptions. García-Gómez et al. (2023) outline functional requirements aimed at mitigating patient harm, emphasizing the need for robust safety, validation, and monitoring frameworks.
Mooghali et al. (2024) focus specifically on cardiovascular care, identifying issues in transparency, data governance, and clinician oversight. Wider ethical concerns—including autonomy, justice, and beneficence—are explored by Savulescu et al. (2024), who argue for stronger regulatory and ethical guardrails during system design and deployment.
Ratwani et al. (2024) highlight the persistence of algorithmic bias in healthcare data and describe its impacts on inequitable care.
Natali et al. (2025) provide a mixed-method review showing how reliance on AI decision support systems may erode clinical judgment and procedural competence over time. The broader safety landscape is the setting for Wang et al. (2024) who discuss the unique risks introduced by LLMs, such as hallucinations, misleading medical advice, and unpredictable model behaviours.
Systematic reviews by Wilhelm et al. (2025) and Xu & Shuttleworth (2024) further illustrate the complexity of AI harms, demonstrating that both technical failures and human–AI interaction issues can compromise patient safety.
Presentation
Dr. Spencer Dorn, Vice Chair and Professor of Medicine at UNC, outlines the risks of using AI in healthcare. Beyond well-known issues like privacy breaches, hallucinations, and bias, he warns of less-discussed risks: AI could paradoxically make clinicians’ work harder instead of easier, eroding promised efficiency gains. He highlights the danger of diminishing critical thinking skills as tasks like note-writing and summarization are offloaded to machines. Most importantly, Dorn stresses that AI could harm the human relationships at the heart of medicine if bots replace authentic communication. He also raises unresolved questions about legal responsibility in the absence of clear regulations, noting that clinicians may ultimately be left accountable for AI-driven errors.
Discussion
The literature reveals a convergence around several interconnected themes:
AI is a source of clinical risk when deployed without sufficient validation or oversight. New studies show that harms arise from incorrect outputs and subtle systemic failure: over-trust in automated recommendations and feedback loops that reinforce biases.
The black-box nature of many AI systems compounds risks and harms. Clinicians may lack the skills to evaluate AI outputs, raising concerns of over-reliance or misuse. AI based tools and information make make physicians' work even more difficult.
Hallucinations from LLM-based tools pose other challenges as their incorrect outputs can be especially difficult for non-expert users to detect.
Some research points to ethical deficits in AI deployment strategies. Many systems are introduced without sufficient patient consent mechanisms, data-governance safeguards, or equity-focused evaluations. Algorithmic bias is pervasive with disproportionate harms falling on marginalized populations.
AI adoption may lead to long-term workforce implications, particularly through clinician deskilling. When AI tools assume responsibilities traditionally held by experts, the gradual erosion of clinical intuition and procedural competence becomes a plausible risk, potentially reducing the resilience of healthcare systems and weakening their ability to respond effectively when AI systems fail.
Researchers emphasize the need for robust governance and continuous monitoring. Technical guardrails are insufficient; ethical oversight, transparent reporting of harms, and inclusion of frontline clinicians and patients in design processes are essential for mitigating risk.
What roles do librarians play in this space?
Librarians have always assumed important roles in patient safety by providing timely, evidence-based information to clinicians at point of care. Through expert literature searches, curation of high-quality resources, and in their teaching users how to locate evidence, librarians help to reduce medical errors, support guideline adherence, and promote informed decision-making.
Embedded in clinical rounds or available via clinical librarian programs, librarians bridge knowledge gaps, prevent misinformation, and contribute directly to safer medication practices, diagnosis, and treatment.
Health sciences librarians (HSLs) can view AI through a lens of patient safety and mitigating harms and risks for the clinical team. It remains to be seen what form this will take in the AI era, and what roles are needed that match our knowledge and skills. Stay tuned.
Conclusion
Without ethical and governance frameworks, AI systems will magnify inequities, compromise patient safety, and diminish clinical expertise in medicine. To utilize AI appropriately and responsibly while minimizing harms, stakeholders should prioritize a range of issues: transparency in the use of AI (see CHART above), human oversight of the use of AI, rigorous validation and continuous monitoring of AI outputs.
The Hippocratic principle of “do no harm" should guide every stage of AI development and deployment from patient surveys to bedside and point of care uses. Deliberate, ethical, and evidence-based integration can AI contribute meaningfully to safer and more equitable healthcare systems. New and emerging evidence will need to underscore the urgency of responsible implementation of AI, and upholding patient safety.
[Garcia-Gomez JM, Blanes-Selva V, Romero CA, de Bartolomé Cenzano JC, Mesquita FP, Pazos A, Doñate-Martínez A. Mitigating patient harm risks: A proposal of requirements for AI in healthcare. Artificial Intelligence in Medicine. 2025 May 23:103168.]
Goldberg CB, Adams L, Blumenthal D, Brennan PF, Brown N, Butte AJ, Cheatham M, DeBronkart D, Dixon J, Drazen J, Evans BJ. To do no harm—and the most good—with AI in health care. Nejm Ai. 2024 Feb 22;1(3):AIp2400036.
Mahajan A, Bates DW. Clinical artificial intelligence-the case for a new physician role. Lancet Reg Health Am. 2025 Oct 25;51:101280. doi: 10.1016/j.lana.2025.101280. PMID: 41209077; PMCID: PMC12595013.
Mello MM, Guha N. Understanding liability risk from using health care artificial intelligence tools. New England Journal of Medicine. 2024 Jan 18;390(3):271-8.
Sahoo RK, Sahoo KC, Negi S, Baliarsingh SK, Panda B, Pati S. Health professionals' perspectives on the use of Artificial Intelligence in healthcare: A systematic review. Patient Educ Couns. 2025 May;134:108680. doi: 10.1016/j.pec.2025.108680. Epub 2025 Jan 27.
[Taylor MA. Impact of Artificial Intelligence on Patient Safety Events: Preliminary Exploration of Events Reported to the PA-PSRS Database. PATIENT SAFETY. 2025 Oct 27;7(2).]
[Townsend BA, Hodge VJ, Richardson H, Calinescu R, Arvind TT. Cautious optimism: public voices on medical AI and sociotechnical harm. Frontiers in Digital Health. 2025 Sep 23;7:1625747.]
"...Large language models (LLMs) are routinely used by physicians and patients for medical advice, yet their clinical safety profiles remain poorly characterized. We present NOHARM (Numerous Options Harm Assessment for Risk in Medicine), a benchmark using 100 real primary-care-to-specialist consultation cases to measure harm frequency and severity from LLM-generated medical recommendations. NOHARM covers 10 specialties, with 12,747 expert annotations for 4,249 clinical management options. Across 31 LLMs, severe harm occurs in up to 22.2% (95% CI 21.6-22.8%) of cases, with harms of omission accounting for 76.6% (95% CI 76.4-76.8%) of errors. Safety performance is only moderately correlated (r = 0.61-0.64) with existing AI and medical knowledge benchmarks. The best models outperform generalist physicians on safety (mean difference 9.7%, 95% CI 7.0-12.5%), and a diverse multi-agent approach reduces harm compared to solo models (mean difference 8.0%, 95% CI 4.0-12.1%). Therefore, despite strong performance on existing evaluations, widely used AI models can produce severely harmful medical advice at nontrivial rates, underscoring clinical safety as a distinct performance dimension necessitating explicit measurement."
Disclaimer
Note: Please use your critical reading skills while reading entries. No warranties, implied or actual, are granted for any health or medical search or AI information obtained while using these pages. Check with your librarian for more contextual, accurate information.