Quality and efficiency of integrating customised large language model-generated summaries versus physician-written summaries: a validation study

Por: Schoonbeek · R. C. · Workum · J. D. · Schuit · S. C. E. · Hoekman · A. H. · Mehri · T. · Doornberg · J. N. · van der Laan · T. P. · Bootsma-Robroeks · C. M. H. H. T. · On behalf of the Applied Artificial Intelligence in Healthcare Consortium · Aalderink · van den Berg · Be

Objectives

To compare the quality and time efficiency of physician-written summaries with customised large language model (LLM)-generated medical summaries integrated into the electronic health record (EHR) in a non-English clinical environment.

Design

Cross-sectional non-inferiority validation study.

Setting

Tertiary academic hospital.

Participants

52 physicians from 8 specialties at a large Dutch academic hospital participated, either in writing summaries (n=42) or evaluating them (n=10).

Interventions

Physician writers wrote summaries of 50 patient records. LLM-generated summaries were created for the same records using an EHR-integrated LLM. An independent, blinded panel of physician evaluators compared physician-written summaries to LLM-generated summaries.

Primary and secondary outcome measures

Primary outcome measures were completeness, correctness and conciseness (on a 5-point Likert scale). Secondary outcomes were preference and trust, and time to generate either the physician-written or LLM-generated summary.

Results

The completeness and correctness of LLM-generated summaries did not differ significantly from physician-written summaries. However, LLM summaries were less concise (3.0 vs 3.5, p=0.001). Overall evaluation scores were similar (3.4 vs 3.3, p=0.373), with 57% of evaluators preferring LLM-generated summaries. Trust in both summary types was comparable, and interobserver variability showed excellent reliability (intraclass correlation coefficient 0.975). Physicians took an average of 7 min per summary, while LLMs completed the same task in just 15.7 s.

Conclusions

LLM-generated summaries are comparable to physician-written summaries in completeness and correctness, although slightly less concise. With a clear time-saving benefit, LLMs could help reduce clinicians’ administrative burden without compromising summary quality.

Uncovering gaps in workforce well-being: a national look at survey practice in Dutch university medical centres - an exploratory quantitative study

Introduction

Maintaining a healthy workforce is crucial for safe, high-quality care. To enhance well-being and engagement in Dutch university medical centres (UMCs), an overview of staff well-being and job perceptions is needed first. Surveys are widely used to improve working conditions, but varying questionnaires hinder a comprehensive view. This study aimed to evaluate the content of employee surveys currently used in UMCs in the Netherlands from a well-being perspective and to analyse the survey results at a national level.

Methods

All seven UMCs were approached to participate in the study and share employee survey data. The primary outcome of interest is work experience; a secondary analysis was conducted. Items were categorised following the Job Demands-Resources model. Descriptive statistics were presented as percentages, means and medians with IQRs.

Results

Two UMCs participated and 31 862 completed surveys were included. Variation in survey items (eg, 15–18 subcategories, 21–33 question items), response options (eg, 1–5, 1–10), frequency (1–3 times per year) and timing were found. Scores on the following outcomes are presented: work overload, coworker support, job control, organisational justice, participation in decision-making, performance feedback, possibilities for learning and development, recognition, task variety, team atmosphere, team effectiveness, trust in leadership, other job resources, connecting/inspiring leadership, self-efficacy, goal-directiveness, boredom, burnout, job satisfaction, work engagement, other employee well-being, commitment organisation/team and work ability. Results should be interpreted with caution, and solely found for hospital A, for certain job control items, median scores of 2 or 3 were observed, whereas the majority of other question items revealed a median score of 4.

Conclusions

There is a significant lack of cohesion across employee surveys. As it stands, employee surveys in Dutch UMCs are not effective tools for monitoring the work experience or well-being of the healthcare workforce. While these surveys may support management decisions, this support is not reflected in interventions related to work and the work environment.

FreshRSS

Quality and efficiency of integrating customised large language model-generated summaries versus physician-written summaries: a validation study

Uncovering gaps in workforce well-being: a national look at survey practice in Dutch university medical centres - an exploratory quantitative study