Browsing by Subject "machine learning"

Now showing 1 - 7 of 7

Accelerating Hyperparameter Tuning in Machine Learning for Alzheimer's Disease With High Performance Computing
(Frontiers Media S.A., 2021-12-08) Zhang, Fan; Petersen, Melissa E.; Johnson, Leigh A.; Hall, James R.; O'Bryant, Sid E.
Driven by massive datasets that comprise biomarkers from both blood and magnetic resonance imaging (MRI), the need for advanced learning algorithms and accelerator architectures, such as GPUs and FPGAs has increased. Machine learning (ML) methods have delivered remarkable prediction for the early diagnosis of Alzheimer's disease (AD). Although ML has improved accuracy of AD prediction, the requirement for the complexity of algorithms in ML increases, for example, hyperparameters tuning, which in turn, increases its computational complexity. Thus, accelerating high performance ML for AD is an important research challenge facing these fields. This work reports a multicore high performance support vector machine (SVM) hyperparameter tuning workflow with 100 times repeated 5-fold cross-validation for speeding up ML for AD. For demonstration and evaluation purposes, the high performance hyperparameter tuning model was applied to public MRI data for AD and included demographic factors such as age, sex and education. Results showed that computational efficiency increased by 96%, which helped to shed light on future diagnostic AD biomarker applications. The high performance hyperparameter tuning model can also be applied to other ML algorithms such as random forest, logistic regression, xgboost, etc.
Hyperparameter Tuning with High Performance Computing Machine Learning for Imbalanced Alzheimer's Disease Data
(MDPI, 2022-11-17) Zhang, Fan; Petersen, Melissa E.; Johnson, Leigh A.; Hall, James R.; O'Bryant, Sid E.
Accurate detection is still a challenge in machine learning (ML) for Alzheimer's disease (AD). Class imbalance in imbalanced AD data is another big challenge for machine-learning algorithms working under the assumption that the data are evenly distributed within classes. Here, we present a hyperparameter tuning workflow with high-performance computing (HPC) for imbalanced data related to prevalent mild cognitive impairment (MCI) and AD in the Health and Aging Brain Study-Health Disparities (HABS-HD) project. We applied a single-node multicore parallel mode to hyperparameter tuning of gamma, cost, and class weight using a support vector machine (SVM) model with 10 times repeated fivefold cross-validation. We executed the hyperparameter tuning workflow with R's bigmemory, foreach, and doParallel packages on Texas Advanced Computing Center (TACC)'s Lonestar6 system. The computational time was dramatically reduced by up to 98.2% for the high-performance SVM hyperparameter tuning model, and the performance of cross-validation was also improved (the positive predictive value and the negative predictive value at base rate 12% were, respectively, 16.42% and 92.72%). Our results show that a single-node multicore parallel structure and high-performance SVM hyperparameter tuning model can deliver efficient and fast computation and achieve outstanding agility, simplicity, and productivity for imbalanced data in AD applications.
Improving Human Identification Using the Human Skin Microbiome
(2021-12) Sherier, Allison J.; Budowle, Bruce; Leudtke, Robert; Phillips, Nicole R.
There are times when biological evidence has too low of quality or quantity of human DNA to provide enough information for human identification (HID). However, nucleic acids from the human skin microbiome are sources of genetic material that may be useful for HID. The studies in this dissertation test the hypothesis that specific single nucleotide polymorphisms (SNPs) of selected human skin microorganisms can be used to attribute an unknown microbiome sample to an individual. The first study investigated how Wright's fixation index (FST) can be used to select potentially informative SNPs for HID. SNPs with high estimated FST were ascertained in three different ways to examine three distinct hypotheses. The hypotheses focused on testing whether a high FST, increased taxonomic abundance, and/or using a predetermined panel would be the most effective for HID. Classification accuracies ranged from 88 – 95%, and the method using the most taxa possible performed the best. Results from the study support that using genetic distance to select informative markers from the human skin microbiome for HID was viable. The predetermined panel only achieved an 88% accuracy, although it would be the most applicable of the tested method for forensic case work. The second study focused on using FST estimations to select SNPs abundant in 51 individuals sampled at three body sites in triplicate for HID. The most common SNPs (present in ≥ 75% of the samples) which had FST estimates ≥ 0.1 were used with least absolute shrinkage and selection operator (LASSO) to select a list of informative SNPs for HID. The final list (i.e., hidSkinPlex+) contains 365 SNPs and achieved a 95% classification accuracy on 459 samples. The hidSkinPlex+ lays the foundation for a targeted sequencing panel that can be used to further study the stability and specificity of human skin microorganism SNPs for HID applications.
Leading Predictors of COVID-19-Related Poor Mental Health in Adult Asian Indians: An Application of Extreme Gradient Boosting and Shapley Additive Explanations
(MDPI, 2023-01-09) Ikram, Mohammad; Shaikh, Nazneen F.; Vishwanatha, Jamboor K.; Sambamoorthi, Usha
During the COVID-19 pandemic, an increase in poor mental health among Asian Indians was observed in the United States. However, the leading predictors of poor mental health during the COVID-19 pandemic in Asian Indians remained unknown. A cross-sectional online survey was administered to self-identified Asian Indians aged 18 and older (N = 289). Survey collected information on demographic and socio-economic characteristics and the COVID-19 burden. Two novel machine learning techniques-eXtreme Gradient Boosting and Shapley Additive exPlanations (SHAP) were used to identify the leading predictors and explain their associations with poor mental health. A majority of the study participants were female (65.1%), below 50 years of age (73.3%), and had income >/= $75,000 (81.0%). The six leading predictors of poor mental health among Asian Indians were sleep disturbance, age, general health, income, wearing a mask, and self-reported discrimination. SHAP plots indicated that higher age, wearing a mask, and maintaining social distancing all the time were negatively associated with poor mental health while having sleep disturbance and imputed income levels were positively associated with poor mental health. The model performance metrics indicated high accuracy (0.77), precision (0.78), F1 score (0.77), recall (0.77), and AUROC (0.87). Nearly one in two adults reported poor mental health, and one in five reported sleep disturbance. Findings from our study suggest a paradoxical relationship between income and poor mental health; further studies are needed to confirm our study findings. Sleep disturbance and perceived discrimination can be targeted through tailored intervention to reduce the risk of poor mental health in Asian Indians.
Pharmacogenetics of Select Genes in the Opiate Metabolism and Response Pathways
(2018-08) Wendt, Frank R.; Budowle, Bruce; Phillips, Nicole R.; LaRue, Bobby L.; Luedtke, Robert R.; Clark, Abbot F.
Pharmacogenetics and pharmacogenomics aim to elucidate the underlying genetic variation contributing to adverse drug reactions, differential enzyme activity, and resulting appropriate drug dosage on the individual and population levels. Studies with this goal in mind typically rely on targeted genotyping of select single nucleotide polymorphisms (SNPs) and/or insertion/deletion (INDEL) polymorphisms within a gene that have demonstrated significant association with the rate of drug absorption, distribution, excretion, and/or metabolism. This approach may enable association and characterization of clinically relevant polymorphisms with a phenotype of interest and may provide guidance regarding appropriate prescription medication practices for medical professionals. Additionally, these data, namely those of the cytochrome p450, family 2, subfamily D, polypeptide 6 gene (CYP2D6), have contributed to identifying cause and/or manner of death in some death investigations which initially were negative medico-legal autopsies. Though invaluable to medical genetics, the chemistry of targeted genotyping approaches, including genome-wide association studies and SNP-targeted massively parallel sequencing, inherently lack the capability to discover novel or rare polymorphisms that may be enriched in pharmacogenetically-valuable cohorts (i.e., individuals who have experienced idiosyncratic responses to codeine/morphine). Relatively recently, the pharmacogenetics community has utilized comprehensive (i.e., full-gene) and/or combinatorial (i.e., multi-gene) genetic studies using multiple genes whose protein products are involved in a drug metabolism/response pathway. The multi-gene approach is demonstrably more successful in predicting phenotypic expressions and more efficacious for patient outcomes compared to as single-gene approach. While mainly elucidating multigenic profiles of psychiatric drugs and disorders, to date, it is reasonable to consider that more efficacious patient outcomes can be achieved using the pathways responsible for other pathologies or drug metabolism and response pathways. The goal of this dissertation was to develop a comprehensive genetic profiling system using the full gene region of five genes that have demonstrated associations between specific SNPs and opiate metabolism/response. The in silico phases of this dissertation aimed to characterize the genes encoding CYP2D6, uridine diphosphate glucuronosyltransferase family 1 polypeptide B7 (UGT2B7), adenosine triphosphate (ATP) binding cassette subfamily B number 1 (ABCB1; p-glycoprotein; multidrug resistance protein 1), opioid receptor mu 1 (OPRM1; MOR1), and catechol-O-methyltransferase (COMT) on the individual SNP and full-gene haplotype levels. Subsequent empirical evaluation of these genes was performed on a cohort of deceased tramadol-exposed Finns using targeted genotyping and exome-wide analyses. This dissertation research has 1) described previously uncharacterized individual SNPs that are associated with the metabolism of tramadol to its primary metabolite, O-desmethyltramadol; 2) evaluated the utility of full-gene information for predicting metabolizer phenotype; 3) produced a massively parallel sequencing panel to genotype opiate-metabolism genes in a more comprehensive and combinatorial manner than previously attempted; 4) demonstrated the increased predictive capabilities of a multigenic opiate metabolizer phenotyping system; and 5) identified additional genetic targets that may have predictive phenotypic value.
Prescription Non-Steroidal Anti-Inflammatory Drugs (NSAIDs) and Incidence of Depression Among Older Cancer Survivors With Osteoarthritis: A Machine Learning Analysis
(Sage Publications, 2023-04-27) Shaikh, Nazneen F.; Shen, Chan; LeMasters, Traci; Dwibedi, Nilanjana; Ladani, Amit; Sambamoorthi, Usha
OBJECTIVES: This study examined prescription NSAIDs as one of the leading predictors of incident depression and assessed the direction of the association among older cancer survivors with osteoarthritis. METHODS: This study used a retrospective cohort (N = 14, 992) of older adults with incident cancer (breast, prostate, colorectal cancers, or non-Hodgkin's lymphoma) and osteoarthritis. We used the longitudinal data from the linked Surveillance, Epidemiology, and End Results -Medicare data for the study period from 2006 through 2016, with a 12-month baseline and 12-month follow-up period. Cumulative NSAIDs days was assessed during the baseline period and incident depression was assessed during the follow-up period. An eXtreme Gradient Boosting (XGBoost) model was built with 10-fold repeated stratified cross-validation and hyperparameter tuning using the training dataset. The final model selected from the training data demonstrated high performance (Accuracy: 0.82, Recall: 0.75, Precision: 0.75) when applied to the test data. SHapley Additive exPlanations (SHAP) was used to interpret the output from the XGBoost model. RESULTS: Over 50% of the study cohort had at least one prescption of NSAIDs. Nearly 13% of the cohort were diagnosed with incident depression, with the rates ranging between 7.4% for prostate cancer and 17.0% for colorectal cancer. The highest incident depression rate of 25% was observed at 90 and 120 cumulative NSAIDs days thresholds. Cumulative NSAIDs days was the sixth leading predictor of incident depression among older adults with OA and cancer. Age, education, care fragmentation, polypharmacy, and zip code level poverty were the top 5 predictors of incident depression. CONCLUSION: Overall, 1 in 8 older adults with cancer and OA were diagnosed with incident depression. Cumulative NSAIDs days was the sixth leading predictor with an overall positive association with incident depression. However, the association was complex and varied by the cumulative NSAIDs days.
Using machine learning to identify predictors of imminent drinking and create tailored messages for at-risk drinkers experiencing homelessness
(Elsevier Inc., 2021-04-20) Walters, Scott T.; Businelle, Michael S.; Suchting, Robert; Li, Xiaoyin; Hebert, Emily T.; Mun, Eun-Young
Adults experiencing homelessness are more likely to have an alcohol use disorder compared to adults in the general population. Although shelter-based treatments are common, completion rates tend to be poor, suggesting a need for more effective approaches that are tailored to this understudied and underserved population. One barrier to developing more effective treatments is the limited knowledge of the triggers of alcohol use among homeless adults. This paper describes the use of ecological momentary assessment (EMA) to identify predictors of "imminent drinking" (i.e., drinking within the next 4 h), among a sample of adults experiencing homelessness and receiving health services at a homeless shelter. A total of 78 mostly male (84.6%) adults experiencing homelessness (mean age = 46.6) who reported hazardous drinking completed up to five EMAs per day over 4 weeks (a total of 4557 completed EMAs). The study used machine learning techniques to create a drinking risk algorithm that predicted 82% of imminent drinking episodes within 4 h of the first drink of the day, and correctly identified 76% of nondrinking episodes. The algorithm included the following 7 predictors of imminent drinking: urge to drink, having alcohol easily available, feeling confident that alcohol would improve mood, feeling depressed, lower commitment to being alcohol free, not interacting with someone drinking alcohol, and being indoors. The research team used the results to develop intervention content (e.g., brief tailored messages) that will be delivered when imminent drinking is detected in an upcoming intervention phase. Specifically, we created three theoretically grounded message tracks focused on urge/craving, social/availability, and negative affect/mood, which are further tailored to a participant's current drinking goal (i.e., stay sober, drink less, no goal) to support positive change. To our knowledge, this is the first study to develop tailored intervention messages based on likelihood of imminent drinking, current drinking triggers, and drinking goals among adults experiencing homelessness.