UPM Institutional Repository

A robust prioritization framework of data quality dimensions to improve ML-driven healthcare systems using AHP and sensitivity analysis


Citation

Al-Hgaish, Areen Metib and Atan, Rodziah and Yaakob, Razali and Osman, Mohd Hafeez (2025) A robust prioritization framework of data quality dimensions to improve ML-driven healthcare systems using AHP and sensitivity analysis. IEEE Access, 13. pp. 158057-158082. ISSN 2169-3536

Abstract

The success of Machine Learning (ML) models in healthcare relies heavily on the quality of data used. High-quality data are crucial for improving the predictive capabilities and the overall performance of ML systems. Despite this, research on data quality in healthcare and ML remains limited, with varying definitions of issues and dimensions across contexts. This study introduces a structured, expert-driven framework for prioritizing data quality dimensions critical to ML performance in healthcare. In contrast to performance evaluation studies involving machine learning algorithms or classifiers, this research does not encompass the training or comparison of predictive models. It addresses a key gap in ML data control by integrating ISO/IEC 25012 with Priestley’s classification and using the Analytic Hierarchy Process (AHP) to evaluate 15 dimensions based on expert judgment. The findings identify Completeness (21.25%), Accuracy (15.53%), Consistency (14.31%), Currentness (14.82%), and Precision (13.82%) as the most influential dimensions for ML healthcare outcomes. A One-at-a-Time (OAT) sensitivity analysis with ±17.6% perturbation confirms the robustness of prioritization despite expert input variability. Key contributions include: 1) a tailored framework for ML healthcare data; 2) AHP-based dimension prioritization; 3) validation through sensitivity testing; 4) insights into data quality’s impact on ML fairness and transparency; and 5) practical guidance for data governance and resource allocation. Future work will apply this framework to clinical datasets to validate its effectiveness in enhancing ML model performance and generalizability.


Download File

[img] Text
121518.pdf - Published Version
Available under License Creative Commons Attribution.

Download (3MB)
Official URL or Download Paper: https://ieeexplore.ieee.org/document/11131122/

Additional Metadata

Item Type: Article
Divisions: Faculty of Computer Science and Information Technology
Faculty of Engineering
DOI Number: https://doi.org/10.1109/ACCESS.2025.3601031
Publisher: Institute of Electrical and Electronics Engineers
Keywords: Data quality; Machine learning; Healthcare; Data dimensions; Predictive models; Analytic hierarchy process; Sensitivity analysis
Depositing User: MS. HADIZAH NORDIN
Date Deposited: 05 Nov 2025 03:21
Last Modified: 05 Nov 2025 06:56
Altmetrics: http://www.altmetric.com/details.php?domain=psasir.upm.edu.my&doi=10.1109/ACCESS.2025.3601031
URI: http://psasir.upm.edu.my/id/eprint/121518
Statistic Details: View Download Statistic

Actions (login required)

View Item View Item