Phenotyping to predict 12-month health outcomes of older general medicine patients

Richard John Woodman, Kimberly Bryant, Michael J. Sorich, Campbell H. Thompson, Patrick Russell, Alberto Pilotto, Aleksander A. Mangoni

Research output: Contribution to journalArticlepeer-review

2 Citations (Scopus)
6 Downloads (Pure)

Abstract

Background: A variety of unsupervised learning algorithms have been used to phenotype older patients, enabling directed care and personalised treatment plans. However, the ability of the clusters to accurately discriminate for the risk of older patients, may vary depending on the methods employed. 

Aims: To compare seven clustering algorithms in their ability to develop patient phenotypes that accurately predict health outcomes. 

Methods: Data was collected for N = 737 older medical inpatients during their hospital stay for five different types of medical data (ICD-10 codes, ATC drug codes, laboratory, clinic and frailty data). We trialled five unsupervised learning algorithms (K-means, K-modes, hierarchical clustering, latent class analysis (LCA), and DBSCAN) and two graph-based approaches to create separate clusters for each method and datatype. These were used as input for a random forest classifier to predict eleven health outcomes: mortality at one, three, six and 12 months, in-hospital falls and delirium, length-of-stay, outpatient visits, and readmissions at one, three and six months. 

Results: The overall median area-under-the-curve (AUC) across the eleven outcomes for the seven methods were (from highest to lowest) 0.758 (hierarchical), 0.739 (K-means), 0.722 (KG-Louvain), 0.704 (KNN-Louvain), 0.698 (LCA), 0.694 (DBSCAN) and 0.656 (K-modes). Overall, frailty data was most important data type for predicting mortality, ICD-10 disease codes for predicting readmissions, and laboratory data the most important for predicting falls. 

Conclusions: Clusters created using hierarchical, K-means and Louvain community detection algorithms identified well-separated patient phenotypes that were consistently associated with age-related adverse health outcomes. Frailty data was the most valuable data type for predicting most health outcomes.

Original languageEnglish
Article number42
Number of pages11
JournalAging Clinical and Experimental Research
Volume37
Issue number1
DOIs
Publication statusPublished - 22 Feb 2025

Keywords

  • Electronic health records
  • Frailty
  • Hierarchical
  • K-Means
  • Latent class analysis
  • Louvain community detection

Fingerprint

Dive into the research topics of 'Phenotyping to predict 12-month health outcomes of older general medicine patients'. Together they form a unique fingerprint.

Cite this