Performance and safety of a fine-tuned small language model for pediatric emergency triage: A benchmark study

Por: Eui Jun Lee · Jae Yun Jung · Do Kyun Kim · Joong Wan Park · Young Ho Kwak

by Eui Jun Lee, Jae Yun Jung, Do Kyun Kim, Joong Wan Park, Young Ho Kwak

Pediatric emergency triage is a safety-critical task, and recent studies have explored whether artificial intelligence, including language models, can support triage decision-making; however, evidence on fine-tuned open-weight language models remains limited. We conducted a retrospective benchmark study using de-identified triage records from a tertiary pediatric emergency department in Korea collected from January 2020 to April 2025. After exclusions, 74,170 encounters were included. Each encounter was reconstructed into a case-level text sequence from triage-time structured variables and nurse-authored narratives. Qwen3-8B-Base was fine-tuned with Low-Rank Adaptation and Group Relative Policy Optimization using a safety-oriented reward design and was compared with a structured-data XGBoost model on a common evaluable test subset of 14,832 encounters. The fine-tuned model achieved an accuracy of 58.60%, a macro-F1 score of 0.417, and a quadratic weighted kappa of 0.535. Within-one-level agreement was 97.13%, and strict under-triage, defined as true Korean Triage and Acuity Scale levels 1 or 2 predicted as levels 4 or 5, occurred in 0.65% of cases. The structured-data comparator showed higher overall performance, with an accuracy of 69.40%, a macro-F1 score of 0.618, and a quadratic weighted kappa of 0.651. However, the fine-tuned model showed fewer extreme errors and lower strict under-triage in selected high-acuity groups, at the cost of higher over-triage. In this real-world pediatric benchmark, the fine-tuned language model did not surpass the structured-data comparator in overall performance but showed a distinct safety-oriented error profile. These findings support its potential role as a decision-support aid for human triage review rather than an autonomous triage system. External and prospective validation will be necessary before clinical implementation.

🏷️ My labels
- ❌
Junio 4^th 2026 at 16:00

Impact of COVID-19-related data drift on machine-learning prognostic models predicting 30-day opioid-related emergency department visits, hospitalisation or mortality: a population-level administrative data study in Alberta, Canada

Por: Sharma · V. · Li · W. · Joon · T. · Dubois · C. · Lau · D. · Jess · E. · Lindeman · C. · Kain · N. A. · Ye · M. · Semenchuk · M. · Eurich · D. T. · Samanani · S.

Objective

To develop machine-learning (ML) models during the COVID-19 pandemic and adjacent time periods to evaluate the impact of data drift on model performance.

Design

This prognostic study used population-level administrative health data to develop ML prediction models.

Setting

Alberta, Canada during 2019–2023.

Participants

All patients over 18 who received at least one opioid dispensation from a community pharmacy within the province of Alberta between 2019–2023.

Exposure

Each opioid dispensation served as the unit-of-analysis.

Main outcomes/measures

Opioid-related outcomes were identified from linked health administrative datasets. Light Gradient Boosting-machine models were developed on pre-pandemic, pandemic and endemic data and temporally validated on 2023 data (pre-pandemic model was also validated on 2020–2021 data) to predict the risk of emergency department visit, hospitalisation or mortality within 30-days of an opioid dispensation. We described key feature distributions across the study time period and changes in model prediction performance on the validation sets using relevant metrics.

Results

Among 1.2 million study participants representing over 13 million opioid dispensations, there were 59 809 (2.1%), 134 402 (2.4%) and 62 143 (2.3%) events reported in the pre-pandemic (2019), pandemic (2020 and 2021) and endemic (2022) time periods, respectively (estimated 2023 validation set pre-test probability of 2.8%). Notable differences in key features were observed in the 2020–2021 model relative to other years. In the 2023 validation set, discrimination performance was highest for the pre-pandemic and endemic models compared with the pandemic model (0.81, 0.83, 0.74, respectively). A similar trend regarding changes from pre-test to post-test probabilities in higher categories of predicted risk (23%, 40%, 16%) was observed. 2020–2021 had the lowest discrimination performance (0.71) and uninformative post-test probabilities (

Conclusion

COVID-19 pandemic health data contributed to significant ML drift. Although ML approaches allow for quick re-training to mitigate drift, health regulators should approach ML prediction with caution when using pandemic-times data.

🏷️ My labels
- ❌
Etiquetas relacionadas
Mayo 21^st 2026 at 19:11

PLOS ONE Medicine&Health
Safety profile of metformin in adolescents with type 2 diabetes: A pharmacovigilance analysis of the FDA Adverse Event Reporting System
Noviembre 21^st 2025 at 15:00

Safety profile of metformin in adolescents with type 2 diabetes: A pharmacovigilance analysis of the FDA Adverse Event Reporting System

Por: Mengsi Peng · Peng Shen · Kyung-In Joung · Kwang Joon Kim

by Mengsi Peng, Peng Shen, Kyung-In Joung, Kwang Joon Kim

Background

Although metformin is the first-line medicine for type 2 diabetes (T2D), its safety profile in adolescents remains poorly understood. This study seeks to investigate the adverse events linked to metformin use in adolescents diagnosed with T2D.

Methods

Data from the Food and Drug Administration Adverse Event Reporting System (FAERS), spanning Q1 2004 to Q2 2024, were retrospectively analyzed in this study. Adverse reactions were standardized using the Medical Dictionary for Regulatory Activities, then significant adverse drug reaction signals were identified through disproportionality analysis employing reporting odds ratio (ROR) and information component (IC) methods.

Results

Of 17,956,653 FAERS reports, 80,187 involved metformin, including 973 in adolescents (10–19 years), with 174 cases were identified with a T2D indication. Analysis at the system organ class level revealed that congenital, familial, and genetic disorders [ROR: 8.8 (4.0, 19.3); IC: 2.2 (1.1, 2.9)] and pregnancy conditions [ROR: 4.9 (2.5, 9.5); IC: 1.8 (0.8, 2.5)] showed the most significant signals. At the preferred term (PT) level, three signals were identified across all sexes and subgroups: treatment noncompliance [ROR: overall 4.14 (2.44, 7.02), male 4.27 (2.00, 9.12), and female 4.65 (2.22, 9.74); IC: overall 1.67 (0.88, 2.22), male 1.60 (0.46, 2.36), and female 1.74 (0.60, 2.50)], lactic acidosis [IC: overall 2.99 (1.91, 3.72), male 2.53 (0.76, 3.61), and female 2.76 (1.34, 3.67)], and gastrointestinal disorder [ROR: overall 13.09 (4.73, 36.23), male 54.33 (6.05, 487.96), female 5.34 (1.10, 25.84)]. Neurological disorders were observed only in males, while pregnancy-related adverse effects and renal disorders occurred exclusively in females. Additionally, the study identified potential new signals not documented in metformin labeling, including areflexia, muscle weakness, ataxia, decreased vibratory sense, rhabdomyolysis, substance use, and axillary pain.

Conclusion

The study reveals a complex safety profile of metformin in adolescents with T2D, warranting further research to confirm risks.

🏷️ My labels
- ❌
Noviembre 21^st 2025 at 15:00

CIN: Computers, Informatics, Nursing
Effect of Infection Control Simulation Based on a Negative Pressure Isolation Room Using Mixed Reality
Junio 17^th 2024 at 02:00

Effect of Infection Control Simulation Based on a Negative Pressure Isolation Room Using Mixed Reality

Por: Kim, Kyeng-Jin · Lee, Joonyoung · Choi, Moon-Ji

This study aimed to examine the effectiveness of an infection control simulation using mixed reality, comparing simulation fidelity with a high-fidelity mannequin (MN) group and problem-based learning with written cases group. This study used a three-group pretest-posttest quasi-experimental design. Two universities with similar curricula were conveniently selected, and a total of 72 nursing students were recruited. Participants were randomly assigned to three groups of 24 each. In the final analysis, there were 22 participants in the mixed reality groups, 21 in the mannequin groups, and 23 in the problem-based learning with written cases groups. Data were analyzed using descriptive statistics and the χ2, Kruskal-Wallis, and Wilcoxon signed rank tests. The mixed reality groups had a significantly positive effect on clinical reasoning ability and clinical competence than the problem-based learning with written cases groups, whereas the mannequin groups had a significantly positive effect on clinical competence than the problem-based learning with written cases groups. Mixed reality simulation is an appropriate simulation method that enhances learning immersion, satisfaction, and self-confidence in simulation. Additionally, it can substitute for mannequin simulation in terms of clinical reasoning ability and clinical competence. This study suggests that it is important to the various approaches in simulation fidelity to diversely enhance the competency of nursing students in simulation outcomes.

🏷️ My labels
- ❌
Etiquetas relacionadas
- ❌
- FEATURE ARTICLE
Junio 17^th 2024 at 02:00

FreshRSS

Performance and safety of a fine-tuned small language model for pediatric emergency triage: A benchmark study

Impact of COVID-19-related data drift on machine-learning prognostic models predicting 30-day opioid-related emergency department visits, hospitalisation or mortality: a population-level administrative data study in Alberta, Canada

Safety profile of metformin in adolescents with type 2 diabetes: A pharmacovigilance analysis of the FDA Adverse Event Reporting System

Effect of Infection Control Simulation Based on a Negative Pressure Isolation Room Using Mixed Reality