Leading Predictors of Economic Burden Among Postmenopausal Women with Heart Failure: An Application of Machine Learning with XGBoost and SHapley Additive exPlanations




Dehghan, Arshama
Park, Chanhyun
Sambamoorthi, Nethra
Shen, Chan
Shara, Nawar
Sambamoorthi, Usha


0000-0002-1081-0950 (Park, Chanhyun)
0000-0002-9949-7306 (Sambamoorthi, Nethra)
0000-0001-5680-3134 (Shen, Chan)
0000-0002-9005-252X (Shara, Nawar)
0000-0001-8311-1360 (Sambamoorthi, Usha)

Journal Title

Journal ISSN

Volume Title



Objective: Heart Failure is associated with high direct healthcare costs, including out-of-pocket spending by the patients. However, there are knowledge gaps in HF research among postmenopausal women. Therefore, this study uses machine learning methods to identify leading predictors and their associations with economic burden among postmenopausal women (age > 50 years) with heart failure. Methods: This cross-sectional study used data from postmenopausal women with heart failure from the 2020 Medical Expenditure Panel Survey (MEPS: weighted N= 600,742). The economic burden was measured with total healthcare expenditures by the payors (third-party expenditures) and out-of-pocket expenditures by the patients and their families. We employed eXtreme Gradient Boosting (XGBoost) regression to determine key predictors. Global and local interpretations of associations were performed using SHapley Additive exPlanations (SHAP). Our predictive model used 21 features such as age, health status including comorbidities (anxiety, arthritis, asthma, cancer, COPD, depression, diabetes, high cholesterol, hypertension, and thyroid disease), perceived physical and mental health status, and polypharmacy. Social determinants of health (SDoH) consisted of marital status, health insurance coverage, prescription drug coverage, education, poverty status, and region. The model building included 70% training and 30% testing split of the data, 10-fold cross-validations, and up to six rounds of optimization using Python 3.9.12. Model performance metrics included absolute mean squared errors, root mean squared error and coefficient of determination; these were evaluated using the test dataset. Results: The model offered excellent accuracy as evidenced by its low mean absolute errors (0.442,0.310), root mean square errors (0.452,0.342), and high coefficients of determination (0.935,0.987) for third-party and out-of-pocket expenditures, respectively. The top 10 leading predictors of third-party expenditures included polypharmacy, age, resident of the Midwest region, asthma, perceived physical and mental health, anxiety, hypertension, white race, and low income. The SHAP plots from the third-party expenditures revealed complex relationships of age, physical, and mental health with the target variable. Polypharmacy, low income, anxiety, and asthma were associated with higher third-party expenditures. Non-Hispanic white Women and those with hypertension had lower third-party expenditures. The top 10 leading predictors of out-of-pocket expenditures included age, Latinx ethnicity, asthma, cancer, being poor, having middle income and high income, prescription drug coverage, private insurance, and polypharmacy. Out-of-pocket expenditure plots only highlighted age as the key complex factor. Being poor, having middle income, and reporting Latinx ethnicity were associated with lower out-of-pocket expenditures. High income, prescription drug coverage, private insurance, polypharmacy, and the presence of asthma and cancer were associated with higher out-of-pocket expenditures. Conclusion: The leading predictors differed by payor source. SDoH were associated with economic burden, suggesting that addressing SDoH may reduce healthcare costs. Cost-containment policies, programs, and interventions at the payor and patient levels need to include effective comorbidity management strategies. The limitations of this study include cross-sectional study design, self-reported data that may be subject to recall bias, and severity of comorbidities that may affect the economic burden. However, the study also has several strengths, such as nationally representative data, the inclusion of SDoH, validated information on expenditures, and robust interpretable machine learning methods.