UPM Institutional Repository

Diabetes prediction using hybrid supervised and unsupervised techniques based on PIMA dataset


Citation

Abu-Shareha, Ahmad Adel and Abualhaj, Mosleh and H. Hussein, Abdelrahman and Amer, Amal and Achuthan, Anusha and Abdul Halin, Alfian (2025) Diabetes prediction using hybrid supervised and unsupervised techniques based on PIMA dataset. Journal of Artificial Intelligence and Technology, 6. pp. 79-87. ISSN 2766-8649

Abstract

Diabetes prediction using machine learning remains challenging due to the limited size and inherent imbalance of available medical datasets. This paper presents a hybrid framework that blends supervised and unsupervised machine learning techniques to improve the accuracy and robustness of early diabetes prediction. The proposed framework integrates clustering, feature selection, andclassification to enhance predictive performance and robustness on small-scale medical datasets, specifically the PIMA Indian Diabetes Dataset. Feature selection using Mutual Information minimizes computational complexity while maintaining discriminative power. The unsupervised clustering component groups similar patient records to reduce intra-class variability, improving class separability for the subsequent supervised learning stage. Thirteen classifiers, including Support Vector Machine, K-Nearest Neighbors, Decision Tree, Random Forest (RF), Neural Networks, Adaptive Boosting, Gaussian Naïve Bayesian, Quadratic Discriminant Analysis, Skope Rules, eXtreme Gradient Boosting (XGB), Gradient Boosting, Deep Neural Network, and Logistic Regression, are evaluated to compare model performance under clustered and non-clustered settings. Experimental results show that ensemble-based classifiers, particularly RF and XGB, achieve the highest accuracy, precision, recall, and area under the curve (AUC) scores across two optimized clusters, confirming that integrating clustering and feature selection substantially improves the robustness of diabetes prediction models. The results showed that the proposed framework achieved 88.5% accuracy, 0.836 precision, 0.836 recall, 0.836 f-measure, and 0.874 AUC using a RF, and 88.5% accuracy, 0.838 precision, 0.832 recall, 0.835 f-measure, and 0.873 AUC with the XGB classifier.


Download File

[img] Text
123687.pdf - Published Version
Available under License Creative Commons Attribution.

Download (770kB)
Official URL or Download Paper: https://ojs.istp-press.com/jait/article/view/899

Additional Metadata

Item Type: Article
Subject: Artificial Intelligence
Divisions: Faculty of Computer Science and Information Technology
DOI Number: https://doi.org/10.37965/jait.2025.0899
Publisher: Intelligence Science and Technology Press Inc.
Keywords: Classification; Clustering; Diabetes prediction
Depositing User: Mr. Mohamad Syahrul Nizam Md Ishak
Date Deposited: 17 Mar 2026 00:35
Last Modified: 17 Mar 2026 00:35
Altmetrics: http://www.altmetric.com/details.php?domain=psasir.upm.edu.my&doi=10.37965/jait.2025.0899
URI: http://psasir.upm.edu.my/id/eprint/123687
Statistic Details: View Download Statistic

Actions (login required)

View Item View Item