Relationship Between Pain and Total Healthcare Expenditure in Elderly Osteoarthritis Patients: An Interpretable Machine Learning (ML) Investigation with eXtreme Gradient Boosting




Journal Title

Journal ISSN

Volume Title




Osteoarthritis (OA), a painful degenerative joint condition, affects over 32.5 million adults in the United States and one-fourth experience severe joint pain. In 2019, adults with OA had $45.4 billion more annual expenditures relative to those without OA. Although statistical methods have been utilized in studying the additional costs associated with pain among adults with OA, there is still a notable gap in understanding the relationship through the lens of ML methods. The objective of this study is to determine pain as a leading predictor of economic burden among older adults (age > 65 years) with OA using ML methods.


We used data on older adults (age > 65 years) with OA (N = 1,640) from the 2021 Medical Expenditure Panel Survey (MEPS), a nationally representative survey of households in the US. Log-transformed total healthcare expenditures, which included payments by the insurers and the patients, represented the economic burden. We employed eXtreme Gradient Boosting (XGBoost) regression to determine key predictors. Global and local interpretations of associations were performed using a SHapley Additive exPlanation (SHAP), including a Partial Dependence Plot (PDP) for pain. Our predictive model utilized 24 features including biological (sex, age), race and ethnicity, clinical (pain, polypharmacy, physical and mental health status, and chronic conditions), and Social Determinants of Health (SDOH) such as marital status, education, poverty status, census region, insurance coverage, and prescription drug coverage. Chronic conditions included anxiety, depression, thyroid disease, diabetes, hypertension, coronary artery disease, cancer, hyperlipidemia, asthma, and chronic obstructive pulmonary disease. Pain interfering with regular work over the past four weeks was assessed using the Veterans Rand 12-item Health survey (VR-12), employing a Likert scale ranging from 0 (none) to 4 (extreme) to represent pain level.

Missing values for pain level were imputed using K-Nearest Neighbors (KNN) Imputation. The model building included 70% training and 30% testing split of the data and 3-fold cross-validations using Python 3.10.12. Model performance was evaluated with R-square, mean absolute error, and Root Mean Square Error (RMSE) using the test dataset.


Approximately, one in 4 adults with OA reported moderate to extreme pain. The top 3 predictors of healthcare expenditures were: polypharmacy, physical health, and pain level. Higher pain levels and polypharmacy were associated with higher total expenditures. Excellent physical health was associated with lower total healthcare expenditures. Additionally, the SHAP PDP suggested a linear relationship between pain levels and total expenditures. Model performance was modest with a mean absolute error (1.086), RMSE (1.736), and R-square (0.452) for total expenditures.


Higher pain levels predicted higher economic burden in older adults with OA. Effective management of pain may be a pathway to reduce the economic burden of OA. As polypharmacy was a leading predictor of healthcare expenditures, this model underscores the importance of reducing polypharmacy use in older adults with medication utilization review and management.