This study aims to develop a methodology to retrieve, harmonise and evaluate the completeness of national body mass index (BMI) data from linked electronic health record (EHR) sources to build a longitudinal research-ready data asset (RRDA).
A longitudinal study of BMI records spanning 23 years (1 January 2000 to 31 December 2022) from four data sources.
The national BMI RRDA is created within the Secure Anonymised Information Linkage (Databank), encompassing the entire population of Wales, UK.
We built a methodology that provides a reproducible framework for extracting and harmonising BMI data from four major linked EHRs across two age groups: children and young people (CYP; 2–18 years old) and adults (19 years and older). The methodology is adaptable across different trusted research environments. We evaluated the completeness and retention of records over 1-, 5- and 23-year periods by calculating the proportion of missing data relative to each year’s population.
We retrieved 53.4 million records for 3.2 million individuals across Wales from 1st January 2000 to 31 December 2022. Among these, 3% of CYP and 34% of adults had repeat BMI measurements recorded over periods ranging from 5 to 23 years. Throughout the entire population of Wales during this period, 49% of CYP and 26% of adults had at least one BMI reading recorded, resulting in a missingness rate of 51% for CYP and 74% for adults. Preserving BMI information by retaining the most recently recorded BMI over 1-, 5- and 23-year intervals from 2022 showed coverage rates of 10%, 33% and 68%, respectively, for CYP, and 25%, 51% and 73%, respectively, for adults.
Our findings highlight substantial variations in BMI data availability and retention across CYP and adults, as well as time periods within EHR in Wales. Wider adoption of this approach can enhance standardised approaches in using accessible measures like BMI to assess disease risk in population-based studies, strengthening public health initiatives and research efforts.