Evaluating Artificial Intelligence–Generated Nursing Care Plans: A Scenario‐Based Comparative Study of Accuracy, Completeness, Quality, and Readability

Por: Dilek Yilmaz Akyaz · Deniz Esim · Gokce Naz Cakir · Elif Aylin Basut · Seyma Tufekci · Mukaddes Konyar · Ozge Yaman · Emre Mor · Sukran Musaoglu · Oguzhan Karaman · Arzu Baygul Eden

ABSTRACT

Aim

This study aimed to evaluate the ability of three generative artificial intelligence tools (ChatGPT, Gemini and DeepSeek) to generate clinically accurate, comprehensive, and readable nursing care plans aligned with standardised nursing taxonomies (North American Nursing Diagnosis Association International, Nursing Interventions Classification, and Nursing Outcomes Classification). The study further explored variations in tool performance across different nursing specialties.

Design

A descriptive comparative design was used.

Methods

Ten expert-validated clinical scenarios representing five nursing specialties (Fundamentals of Nursing, Medical, Surgical, Paediatric and Psychiatric Nursing) were presented to the three artificial intelligence tools. Each tool responded to four standardised prompts based on the latest North American Nursing Diagnosis Association International, Nursing Interventions Classification and Nursing Outcomes Classification taxonomies. Outputs were assessed for quality, accuracy, completeness and readability by expert evaluators using validated scales.

Results

All tools produced nursing care plans of moderate-to-high quality. DeepSeek demonstrated slightly higher accuracy and completeness compared with Gemini and ChatGPT. Surgical nursing scenarios yielded the highest performance, likely reflecting the more protocolised and pathway-driven nature of perioperative care. However, all outputs were incomplete and written at a college-level readability, limiting accessibility for clinical use.

Conclusion

Generative artificial intelligence tools can support the production of structured nursing care plans requiring expert review and adaptation, particularly in less standardised clinical domains, but their limitations in completeness and readability indicate they should be regarded only as preliminary drafts requiring expert review and adaptation.

Impact

The study examined whether generative artificial intelligence can reliably assist in creating nursing care plans. All tools performed moderately well, with DeepSeek showing slight advantages, but outputs were incomplete and difficult to read. Findings are relevant to clinical nurses, educators, healthcare managers and policymakers worldwide who are exploring artificial intelligence in nursing workflows.

Reporting Method

This study adhered to the Strengthening the Reporting of Observational Studies in Epidemiology (STROBE) guidelines.

Patient or Public Contribution

This study did not include patient or public involvement in its design, conduct or reporting.

FreshRSS