Reliability of data-driven versus expert-driven composite indicators in between-hospital comparisons on quality of oesophagogastric cancer surgery: a population-based retrospective cohort study

Por: van der Linde · M. · Eijkenaar · F. · Visser · M. R. · Wijnhoven · B. P. · Lingsma · H. F. · Oude Voshaar · M. A. · the Dutch Upper Gastrointestinal Cancer Audit (DUCA) Group · Gisbertz · Hillegersberg — Noviembre 6^th 2025 at 06:00

Objective

To construct a data-driven composite from (a subset of) currently used quality indicators for oesophagogastric cancer surgery and to evaluate whether this approach enhances the reliability of between-hospital comparisons on outcome relative to the expert-driven composite indicator ‘textbook outcome (TO)’.

Design

In this retrospective cohort study, we applied Item Response Theory (IRT) to construct a data-driven continuous composite indicator reflecting a single latent variable—the quality of surgical care—and estimated latent variable scores for all individual patients. Reliability was compared between the expert-driven (TO) and data-driven (IRT) composite indicators.

Setting

All Dutch hospitals providing oesophagogastric cancer surgery.

Participants

All patients who underwent oesophagectomy (n=3588) or gastrectomy (n=1782) between 2018 and 2022 as registered in the Dutch Upper GI Cancer Audit (DUCA).

Primary and secondary outcome measures

We evaluated the reliability of between-hospital comparisons using ‘rankability’, which quantifies the proportion of observed variation in indicator scores between hospitals not attributable to chance.

Results

Seven out of 15 quality indicators were included in the IRT composite indicator. Most of the patients were assigned the artificial maximum of the continuous quality score (ie, ceiling effect), resulting in similar average hospital scores. Relative to TO, rankability increased when using the IRT composite for oesophagectomy (57% vs 41%) but declined for gastrectomy (38% vs 47%).

Conclusions

The selected seven quality indicators for oesophageal and gastric cancer surgery represent a single latent variable but are not yet optimal for differentiating surgical care quality due to ceiling effects. Despite using fewer indicators, the continuous IRT score showed a promising increase in rankability for oesophagectomy, suggesting that data-driven composite indicators may enhance hospital benchmarking reliability.

FreshRSS

Reliability of data-driven versus expert-driven composite indicators in between-hospital comparisons on quality of oesophagogastric cancer surgery: a population-based retrospective cohort study