TY - JOUR
T1 - Alzheimer's disease progression detection model based on an early fusion of cost-effective multimodal data
AU - El-Sappagh, Shaker
AU - Saleh, Hager
AU - Sahal, Radhya
AU - Abuhmed, Tamer
AU - Islam, S. M.Riazul
AU - Ali, Farman
AU - Amer, Eslam
N1 - Funding Information:
This work was supported by the National Research Foundation of Korea (NRF) grant funded by the Korean government (MSIT) ( NRF-2016R1D1A1A03934816 ).
Publisher Copyright:
© 2020 Elsevier B.V.
PY - 2021/2/1
Y1 - 2021/2/1
N2 - Alzheimer's disease (AD) is a severe neurodegenerative disease. The identification of patients at high risk of conversion from mild cognitive impairment to AD via earlier close monitoring, targeted investigations, and appropriate management is crucial. Recently, several machine learning (ML) algorithms have been used for AD progression detection. Most of these studies only utilized neuroimaging data from baseline visits. However, AD is a complex chronic disease, and usually, a medical expert will analyze the patient's whole history when making a progression diagnosis. Furthermore, neuroimaging data are always either limited or not available, especially in developing countries, due to their cost. In this paper, we compare the performance of five widely used ML algorithms, namely, the support vector machine, random forest, k-nearest neighbor, logistic regression, and decision tree to predict AD progression with a prediction horizon of 2.5 years. We use 1029 subjects from the Alzheimer's disease neuroimaging initiative (ADNI) database. In contrast to previous literature, our models are optimized using a collection of cost-effective time-series features including patient's comorbidities, cognitive scores, medication history, and demographics. Medication and comorbidity text data are semantically prepared. Drug terms are collected and cleaned before encoding using the therapeutic chemical classification (ATC) ontology, and then semantically aggregated to the appropriate level of granularity using ATC to ensure a less sparse dataset. Our experiments assert that the early fusion of comorbidity and medication features with other features reveals significant predictive power with all models. The random forest model achieves the most accurate performance compared to other models. This study is the first of its kind to investigate the role of such multimodal time-series data on AD prediction.
AB - Alzheimer's disease (AD) is a severe neurodegenerative disease. The identification of patients at high risk of conversion from mild cognitive impairment to AD via earlier close monitoring, targeted investigations, and appropriate management is crucial. Recently, several machine learning (ML) algorithms have been used for AD progression detection. Most of these studies only utilized neuroimaging data from baseline visits. However, AD is a complex chronic disease, and usually, a medical expert will analyze the patient's whole history when making a progression diagnosis. Furthermore, neuroimaging data are always either limited or not available, especially in developing countries, due to their cost. In this paper, we compare the performance of five widely used ML algorithms, namely, the support vector machine, random forest, k-nearest neighbor, logistic regression, and decision tree to predict AD progression with a prediction horizon of 2.5 years. We use 1029 subjects from the Alzheimer's disease neuroimaging initiative (ADNI) database. In contrast to previous literature, our models are optimized using a collection of cost-effective time-series features including patient's comorbidities, cognitive scores, medication history, and demographics. Medication and comorbidity text data are semantically prepared. Drug terms are collected and cleaned before encoding using the therapeutic chemical classification (ATC) ontology, and then semantically aggregated to the appropriate level of granularity using ATC to ensure a less sparse dataset. Our experiments assert that the early fusion of comorbidity and medication features with other features reveals significant predictive power with all models. The random forest model achieves the most accurate performance compared to other models. This study is the first of its kind to investigate the role of such multimodal time-series data on AD prediction.
KW - Alzheimer disease
KW - Disease progression detection
KW - Machine learning
KW - Multimodal data analysis
UR - http://www.scopus.com/inward/record.url?scp=85092710449&partnerID=8YFLogxK
U2 - 10.1016/j.future.2020.10.005
DO - 10.1016/j.future.2020.10.005
M3 - Article
AN - SCOPUS:85092710449
VL - 115
SP - 680
EP - 699
JO - Future Generation Computer Systems
JF - Future Generation Computer Systems
SN - 0167-739X
ER -