Comparison of support vector machine, random forest, extreme gradient boosting and lasso and elastic-net regularized generalized linear model for Alzheimer's Disease prediction

Date

2021

Authors

Zhang, Fan
Petersen, Melissa
Johnson, Leigh
Hall, James
O'Bryant, Sid

ORCID

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

Purpose: Machine learning based blood test shows promise in detecting Alzheimer's disease (AD) and pinpointing mechanisms underlying the process of neurodegeneration. Model selection plays a crucial role in building good machine learning models for AD prediction. Methods: The paper presents a comparison of four machine learning algorithms: support vector machine (SVM), random forest (RF), extreme gradient boosting (XGBoost )and lasso and elastic-net regularized generalized linear model (GLMNET) for Alzheimer's disease prediction using blood test data from serum. First, we implemented 10 times repeated 5-fold cross-validation to split the data into training set and testing set randomly 50 times to select the best hyperparameters for each selected machine learning method. Then we selected the best learning model based on the performance metrics in the testing set. Results: Of all compared prediction results in the training set, RF and XGBoost achieved the highest negative predictive value (100%) followed by SVM with 99.40% and GLMNET with 94.45%. Of all compared prediction results in the testing set, SVM achieved the highest negative predictive value (96.96%) followed by XGBoost with 95.94%, RF with 95.59%, and GLMNET with 94.27%. With 28-cores high performance computing, RF took 1.35 hours CPU usage, SVM 1.10 hours, XGBoost 48 seconds, and GLMNET 47 seconds, respectively. Conclusions: SVM, RF, and XGBoost are the top three best models for AD prediction. SVM performs better in handling overfitting problem in the training set with small size than RF and XGBoost and also achieved best performance in the testing set.

Description

Keywords

Citation