Skip to main navigation Skip to search Skip to main content

Machine Learning-driven Probability Calculators Can Accurately Predict 1-year Mortality After Proximal Humerus Fractures in Patients Over the Age of 65 Years

  • Stijn R.J. Mennes
  • , Sebastian Engbers
  • , Bjarty L. Garcia
  • , Reinier W.A. Spek
  • , Roelina Munnik-Hagewoud
  • , Rutger G. Zuurmond
  • , Ruurd L. Jaarsma
  • , Job N. Doornberg
  • , Michel P.J. van den Bekerom
  • , Machine Learning Consortium

Research output: Contribution to journalArticlepeer-review

1 Citation (Scopus)

Abstract

Background: Proximal humerus fractures (PHFs) in patients ≥ 65 years of age are associated with increased risk of death in the months after injury. Controversy exists regarding the preferred treatment strategy in these patients, and operative treatment is associated with high complication and reoperation rates. Machine learning (ML)-driven probability calculators for mortality prediction therefore may be valuable during shared decision-making for surgeons and patients. 

Questions/purposes : (1) To develop ML algorithms to predict 1-year mortality in patients ≥ 65 years of age. (2) To externally validate all algorithms on a geographically distinct patient population. (3) To create an easy-to-use, online calculator that can be used by surgeons at the point of care to enable more informed decision-making. 

Methods: This study identified 5114 potentially eligible patients age ≥ 65 years who presented to our two hospitals in Holland (one is a Level 1 trauma center and one is a Level 2 trauma center) between January 2016 and December 2023. Of those, we considered 3488 patients eligible because they were ≥ 65 years of age and had a first-time PHF. Based on that, 86% (2999) were included for the analysis. A further 10% (334) were excluded because of misdiagnosis, bilateral PHFs, or a history of previous PHFs. Finally, 4% (155) had an irretrievable mortality status or had incomplete data sets. Data on 24 potential factors associated with increased mortality after PHFs were collected. Surgical or nonoperative treatment were not included as the aim was to predict 1-year mortality at the moment a PHF was sustained, before a treatment choice had been made. Therefore, excluding treatment modalities does not limit the intended use as a pretreatment risk estimation model. Four ML algorithms were developed: logistic regression, extreme gradient boosting machine (XGBoost), random forest, and LightGBM. The ML algorithms were trained and internally validated on patients from the first hospital (59% [1768 of 2999]) and externally validated on a geographically distinct group of patients from the second hospital (41% [1231 of 2999]). The mean ± SD age in the training cohort was 77 ± 8 years, and it was 76 ± 8 years in the external validation set; 79% (2383 of 2999) of patients were female. The overall 1-year mortality rate was 11% (325 of 2999). Performance was assessed with discrimination and calibration curves, and overall performance was assessed using the Brier score. Discrimination was assessed with the c-statistic: the area under the receiver operating characteristic curve. The c-statistic ranges from 0.50 to 1.0, with 1.0 indicating perfect discriminating ability. Calibration was assessed by plotting the agreement between the observed outcome and predicted probability, and the intercept and slope were determined. The plot's intercept indicates whether predictions were too high (intercept < 0) or too low (intercept > 0). The slope reflects either overfitting (predictions too extreme, slope > 1) or underfitting (predictions not extreme enough, slope < 1). An ideal prediction model has a calibration curve with an intercept of 0 and a slope of 1. The Brier score reflects the overall performance, a composite of discrimination and calibration. A score of 0 reflects perfect prediction and 1 indicates the worst prediction. Negative and positive predictive values were also assessed. For internal validation, fivefold cross-validation was performed to prevent data leakage, and 1000-fold bootstrapping was used to ensure robust results and account for optimism. Cross-validation entails dividing the training set into subsets (five), and the models are subsequently trained on four sets. The fifth, unseen set is used for internal validation and prevents overestimated model performance. For external validation, performance was assessed using only 1000-fold bootstrapping to ensure robust results and correct for optimism. 

Results: Algorithms performed similarly with c-statistics (discriminative ability), ranging from 0.80 to 0.81 (95% confidence interval [CI] 0.72 to 0.86) in internal validation and 0.83 to 0.85 (95% CI 0.81 to 0.86) in external validation. C-statistics exceeding 0.80 are considered a strong performance for mortality prediction models in geriatric trauma populations. Logistic regression was chosen as the best model from among those evaluated because of adequate calibration and interpretability. The strong calibration ensures that the model is not subject to overfitting or underfitting and does not predict too high or low. Logistic regression is interpretable as it needs fewer predictors and provides comprehensible coefficients. The negative predictive value was 0.91 (95% CI 0.90 to 0.92), the positive predictive value 0.66 (95% CI 0.54 to 0.81), and the factors most strongly associated with mortality were hemiplegia, prefracture residence in a healthcare institution, and heart failure. 

Conclusion: This study developed and externally validated an ML-driven prediction model that accurately provides the 1-year mortality risk for an individual patient. This tool for prognosis estimation can be used by physicians during shared decision-making and patient counseling, as it enhances the informed consent process by providing patients and families with realistic expectations when considering treatment options for PHFs. The prediction tool was incorporated into a freely available web application and can be accessed through https://bjarty.shinyapps.io/mortality_app/ . 

Level of Evidence: Level III, therapeutic study.

Original languageEnglish
Pages (from-to)1020-1032
Number of pages13
JournalClinical Orthopaedics and Related Research
Volume484
Issue number5
DOIs
Publication statusPublished - May 2026
Externally publishedYes

Keywords

  • orthopaedics
  • proximal humeral fractures
  • mortality
  • machine learning

Fingerprint

Dive into the research topics of 'Machine Learning-driven Probability Calculators Can Accurately Predict 1-year Mortality After Proximal Humerus Fractures in Patients Over the Age of 65 Years'. Together they form a unique fingerprint.

Cite this