Browsing by Author "Elchehabi, Sahar"

Now showing 1 - 2 of 2

Assessing the Reliability of Current AI Platforms in Delivering Health Information Related to Crohn Disease, Ulcerative Colitis, and Colorectal Cancer
(2024-03-21) Isa, Salman; Elchehabi, Sahar; Jafri, Faraz; Hoang, Long; Sharma, Mukesh; Hapuarachchi, Menalee; Gonzales, Gabriel; Lewis, Trina; Richardson, Justin; Nguyen, Elizabeth; Hyman, Charles
Purpose In recent years, the advancements in artificial intelligence (AI) have revolutionized the way we seek and access health information. With more people turning towards AI for answers to their problems, it is important to question how safe it is to rely on AI for answers to health-related issues. We explored the accuracy of ChatGPT—a language model developed by OpenAI—and Gemini—Google’s AI platform—in providing health information related to Crohn disease, ulcerative colitis, and colorectal cancer. Methods We generated 10 questions relating to Crohn disease, ulcerative colitis, and colorectal cancer in relation to the social, psychological, economic, and physical aspects that patients with these diseases may face. Each query was remastered for each disease, resulting in 30 total questions which were posed to the two separate AI models. We then regenerated the responses for a total of three times ending up with 90 generated responses per AI model. We also measured the Flesch-Kincaid Readability scores for each response and analyzed the sentiment of the text using natural language processing and computational linguistics. The Centers of Disease Control and Prevention (CDC) recommend that medical information for the public be written at no higher than an eighth-grade reading level. Generated AI responses were evaluated by six gastroenterologist attendings and fellows for accuracy within the context of a patient seeking information. Sets were deemed inappropriate if any of the three responses contained inaccurate or misleading information, based on clinical judgment. Evaluators were blinded to model names and prices. Interrater agreement (94%) and reliability (κ score, 0.87) were ideal. The study was performed in July 2023. Results Of the 60 questions posed to the two different AI language models, 45% (n = 27) of the responses were found to be inaccurate. When the two AI models were compared, 43.33% (n = 13) of ChatGPT’s responses were accurate while 46.7% (n = 14) of Gemini’s responses were deemed accurate. ChatGPT also had a 13.20 average Flesch Kincaid Reading grade level and a 31.06 average Flesch Kincaid Readability score. Gemini’s responses received an average Flesch Kincaid Reading grade level of 8.34 and an average Flesch Kincaid Readability score of 56.92. ChatGPT’s average sentiment score was a 1.23 while Gemini’s average score was a 0.92. Conclusion While OpenAI’s ChatGPT and Google's Gemini platform can serve as valuable resources for information retrieval, they possess certain limitations when it comes to health-related information for Crohn disease, ulcerative colitis, and colorectal cancer. Importantly, both AI models in the study provided inappropriate responses to common patient questions regarding these conditions. Medical professionals should be aware of these limitations as they may lead to the spread of misinformation in populations with limited access to health care.
Evaluating Equity in XGBoost Predictions of High Healthcare Expenditures for Older Women with Osteoarthritis in the United States
(2024-03-21) Elchehabi, Sahar; Dehghan, Arshama; Pathak, Mona; Sambamoorthi, Nethra; Park, Chanhyun; Shen, Chan; Sambamoorthi, Usha
Purpose: Osteoarthritis (OA) is a highly prevalent and debilitating condition among older adults. Studies suggest that women are more prone to develop symptomatic disease than men. OA is associated with high direct healthcare costs, attributable to its complex disease management. Furthermore, a small segment of this population may incur very high costs. Identifying these high-cost users is important for allocation of resources, cost containment, quality improvement, and population health management. However, current research in the prediction of high-cost users in OA using machine learning (ML) models is limited. Furthermore, ML model predictions of high-cost users must be equitable across sensitive attributes such as race and ethnicity and socio-economic status. This study investigated the leading predictors of high-cost users among older women with OA utilizing ML methods and the fairness of the ML algorithm in its predictions across subgroups of race and ethnicity, poverty, and education. Methods: A cross-sectional study was conducted using data from older women (age>65 years) with OA from the 2021 Medical Expenditure Panel Survey, a nationally representative survey of the non-institutionalized civilian households in the US. High-cost users were identified as having higher than the 90th percentile (>$39,388) in total healthcare expenditures. Key predictors were identified using interpretable ML model eXtreme Gradient Boosting (XGBoost) Classification and SHapley Additive exPlanations (SHAP). Overall model fit was evaluated with AUC, recall, and precision. Fairness was measured with demographic parity (equalization of odds, disparate impact, and equal opportunity) across racial and ethnic groups (Non-Hispanic White (NHW), Non-Hispanic Black (NHB), Hispanic ethnicity), education (no college and college) and poverty status (low income and high income). Counterfactual fairness was evaluated to ensure consistency in high-cost predictions between actual scenarios and counterfactual situations where individuals belong to different groups. Results: A higher percentage of Hispanic (12.2%) and NHB (14.4%) were high-cost users compared to NHW (9.0%). A higher percentage of older women without college education (10.7%) and with low income (11.2%) compared to those with college education (2.5%) and high income (5.2%) were high-cost users. The overall model fit was acceptable with AUC 0.81, recall 0.62, and precision 0.91. Multimorbidity, high school education level, and anxiety were the top 3 predictors of high-cost users. Prediction was lower among older women without college education (AUC = 0.80) and low income compared (AUC = 0.77) compared to overall prediction (AUC = 0.81). Demographic parity revealed little to no differences across racial and ethnic, education, and income groups. Conclusion: The fairness metrics indicated no bias in the predictions, likely attributable to the nationally representative nature of the survey sample and its large size. These findings need to be confirmed with other data that contain diverse populations. Leading predictors indicated that effective management of multimorbidity may reduce the risk of high-cost use in older women with OA.