Citation
Tang, Yan and Jia, Lei and Zhou, Junjun and Dou, Jin and Qian, Jingjuan and Yi, Xin and Soh, Kim Lam
(2026)
Development of an explainable machine learning model for predicting depression in adults with type 2 diabetes mellitus: a cross-sectional SHAP-based analysis of NHANES 2009-2023.
Medicine, 105 (6).
art. no. undefined.
pp. 1-11.
ISSN 1536-5964
Abstract
Depression (DEP) is a common yet underdiagnosed comorbidity in adults with type 2 diabetes mellitus (T2DM), worsening glycemic control and increasing complication risk. Practical, interpretable risk tools using routine patient data are limited. We conducted a cross-sectional analysis using data from adults with T2DM enrolled in the National Health and Nutrition Examination Survey between 2009 and 2023. DEP was classified based on a Patient Health Questionnaire-9 score of 10 or higher. Twenty-eight candidate predictors encompassing demographic characteristics, clinical and biochemical measurements, and lifestyle factors were initially included. Variable selection was performed using least absolute shrinkage and selection operator regression. Five machine learning algorithms - random forest, extreme gradient boosting (XGBoost), multilayer perceptron, logistic regression, and support vector machine - were trained and evaluated using 5-fold cross-validation. The best-performing model was interpreted through SHapley Additive exPlanations analysis to identify the most influential predictors. A streamlined version incorporating the top 10 predictors was further developed and implemented as a user-friendly web-based risk estimation tool. Among 2837 participants, 449 (15.8%) were identified as having comorbid DEP. The XGBoost model demonstrated the highest discriminative ability, with a validation area under the receiver operating characteristic curve of 0.888, accuracy of 0.834, F1-score of 0.715, sensitivity of 0.577, and specificity of 0.979, surpassing the performance of the other algorithms evaluated. SHapley Additive exPlanations analysis revealed gender, poverty-to-income ratio, sleep duration, smoking status, educational levels, race, age, high cholesterol, hypertension, and insulin use as the most influential predictors. A streamlined XGBoost model incorporating only these 10 variables achieved an area under the curve of 0.886, closely matching the predictive capability of the full model. The deployed web-based tool enables rapid and individualized estimation of DEP risk in patients with T2DM using routinely available clinical and demographic information. Explainable machine learning applied to nationally representative data can accurately identify adults with T2DM at heightened risk of DEP using a small set of noninvasive clinical features. The deployed tool offers a scalable, interpretable, and clinically actionable approach to support early detection and intervention, potentially improving mental health outcomes in this high-risk population.
Download File
Additional Metadata
Actions (login required)
 |
View Item |