Making Dementia Blood-based Biomarker Data More Interpretable Through Machine Learning




Journal Title

Journal ISSN

Volume Title



Background: Research and data have linked many possible factors that contribute to the cause and progression of Alzeheimer’s disease and dementia. These include traits such as age, gender, ethnicity, and specific blood-based biomarkers. There has been a great deal of work gathering this information, but comparatively less work has been done to consolidate and present it in an easily coherent and comprehensible form. This study aims to use and sort relevant data related to Alzheimer’s disease with machine learning and make it more interpretable through visualization. Methods: The data being analyzed was collected from n = 1705 Hispanic and Non-Hispanic participants with and without cognitive impairment (n = 1328 NC, n = 261 MCI, n = 116 AD) from the HABS-HD cohort. Associated factors measured and considered from each participant included: gender, Hispanic or Non-hispanic ethnicity, education level, and various blood biomarker levels (CRP, FABP3, IL-10, IL-6, Ab40, Ab42, Tau, NFL, PPY, sICAM-1, sVCAM-1, TNF-alpha, GLP-1, Glucagon, PYY, Insulin, HOMA-IR). The Decision Tree classifier tool was applied to the dataset incorporating the scikit-learn Python coding program and the use of multiple parameters in generating the decision tree. The dtreeviz method was also applied in order to provide further visualization to the data. Results: Decision trees were capable of being generated from the given data set of participants based on cognitive status and blood-based biomarkers for Alzheimer’s disease and dementia. Visualized versions of these decision trees were also capable of being successfully generated. The quality and parameters of the decision trees as well as the appearance of the visualization could also be modified. There appears to be some limitations in the Scikit-learn and dtreeviz package that could warrant further troubleshooting or acknowledgement. Conclusion: Based on the results, it appears that visualized decision trees are capable of being generated from a large set of data. Such visualized decision trees compared to the raw data tables or decision trees themselves are much simpler to interpret and recognize patterns. Such patterns could prove useful in determining future areas of study to focus on or affirm already completed studies.