A hybrid machine learning approach for prediction of conversion from mild cognitive impairment to dementia

Magda Bucholc, Sofya Titarenko, Xuemei Ding, Callum Canavan, Tianhua Chen

Research output: Contribution to journalArticlepeer-review

Abstract

Mild cognitive impairment (MCI) represents a precursor to dementia for many individuals; however, some forms of MCI tend to remain stable over time and do not progress to dementia (Jicha et al., 2006; Petersen et al., 1999; Visser et al., 2006). In fact, conversion rates vary substantially depending on the diagnostic criteria used and the nature of the analytic sample and clinical setting (Ganguli et al., 2004; Ritchie, Artero, & Touchon, 2001). To identify personalized strategies to prevent or slow the progression of dementia and to support the clinical development of novel treatments, we need to develop new approaches for modelling disease progression that can differentiate between progressive and non-progressive MCI subjects. The aim of this study was to develop a novel prognostic machine learning (ML) framework utilising longitudinal information encoded in efficient, cost-effective, and noninvasive markers to identify MCI subjects that are at risk for developing dementia. Our approach was developed using the dataset from the National Alzheimer’s Coordinating Center. We built two prognostic models based on the patient data from 3 (n=768) (Model 1) and 4 (n=409) (Model 2) assessment visits. A novel hybrid prognostic approach, using cognitive trajectory classes, generated through unsupervised learning (Stage 1), as input in supervised ML models (Stage 2), was developed and systematically tested. Our unsupervised learning approach (Stage 1) involved: (i) the implementation of the longitudinal data partitioning method allowing for clustering trajectories based on their shapes; (ii) validation of the optimal number of clusters using three different Clustering Validity Indices (CVIs), and (iii) application of the fusion-based methods for combining CVIs into the fused normalized CVI scores, averaged for each cluster partition to determine the final number of trajectory classes for each type of clinical scores. In Stage 2, we built four types of prognostic models based on random forest (RF), Support Vector Machines (SVM), logistic regression (LR), and kNN
ensemble approaches. Classification models incorporating both clinical scores and cognitive trajectory classes input showed up to 6.5% higher accuracy than models based only on clinical scores (p < 0.05 in all cases). Given the patient data from three time points (Model 1), the highest recorded prediction accuracy was achieved for the ensemble and RF model, i.e., 85.0% (standard deviation: 3.1%) and 84.6% (4.1%) respectively. Using the patient data from four time points (Model 2), the highest accuracy was reported for RF and ensemble models, i.e., 87.5% (6.1%) and 86.8% (3.7%) respectively. We showed that the incorporation of the output of unsupervised learning significantly improved the performance of supervised ML models. Our prognostic framework can be applied to improve recruitment in clinical trials and to select early interventions for individuals at high risk of developing dementia.
Original languageEnglish
JournalExpert Systems with Applications
Publication statusAccepted/In press - 12 Jan 2023

Cite this