Citation
Abstract
The success of Machine Learning (ML) models in healthcare relies heavily on the quality of data used. High-quality data are crucial for improving the predictive capabilities and the overall performance of ML systems. Despite this, research on data quality in healthcare and ML remains limited, with varying definitions of issues and dimensions across contexts. This study introduces a structured, expert-driven framework for prioritizing data quality dimensions critical to ML performance in healthcare. In contrast to performance evaluation studies involving machine learning algorithms or classifiers, this research does not encompass the training or comparison of predictive models. It addresses a key gap in ML data control by integrating ISO/IEC 25012 with Priestley’s classification and using the Analytic Hierarchy Process (AHP) to evaluate 15 dimensions based on expert judgment. The findings identify Completeness (21.25%), Accuracy (15.53%), Consistency (14.31%), Currentness (14.82%), and Precision (13.82%) as the most influential dimensions for ML healthcare outcomes. A One-at-a-Time (OAT) sensitivity analysis with ±17.6% perturbation confirms the robustness of prioritization despite expert input variability. Key contributions include: 1) a tailored framework for ML healthcare data; 2) AHP-based dimension prioritization; 3) validation through sensitivity testing; 4) insights into data quality’s impact on ML fairness and transparency; and 5) practical guidance for data governance and resource allocation. Future work will apply this framework to clinical datasets to validate its effectiveness in enhancing ML model performance and generalizability.
Download File
Official URL or Download Paper: https://ieeexplore.ieee.org/document/11131122/
|
Additional Metadata
| Item Type: | Article |
|---|---|
| Divisions: | Faculty of Computer Science and Information Technology Faculty of Engineering |
| DOI Number: | https://doi.org/10.1109/ACCESS.2025.3601031 |
| Publisher: | Institute of Electrical and Electronics Engineers |
| Keywords: | Data quality; Machine learning; Healthcare; Data dimensions; Predictive models; Analytic hierarchy process; Sensitivity analysis |
| Depositing User: | MS. HADIZAH NORDIN |
| Date Deposited: | 05 Nov 2025 03:21 |
| Last Modified: | 05 Nov 2025 06:56 |
| Altmetrics: | http://www.altmetric.com/details.php?domain=psasir.upm.edu.my&doi=10.1109/ACCESS.2025.3601031 |
| URI: | http://psasir.upm.edu.my/id/eprint/121518 |
| Statistic Details: | View Download Statistic |
Actions (login required)
![]() |
View Item |
