UPM Institutional Repository

ExtraImpute: a novel machine learning method for missing data imputation


Citation

Alabadla, Mustafa and Sidi, Fatimah and Ishak, Iskandar and Ibrahim, Hamidah and Affendey, Lilly Suriani and Hamdan, Hazlina (2022) ExtraImpute: a novel machine learning method for missing data imputation. Journal of Advances in Information Technology, 13 (5). 470 - 476. ISSN 1798-2340

Abstract

Missing values are one of the common incidences that occurs in healthcare datasets. Its existence usually leads to undesirable results while conducting data analysis using machine learning methods. Recently, researchers have proposed several imputation approaches to deal with missing values in real-world datasets. Moreover, data imputation assists us to build a high-performance machine learning models to discover patterns in healthcare data that provides top-notch insights for a higher quality decision-making. In this paper, we propose a new imputation approach using Extremely Randomized Trees (Extra Trees) of machine learning ensemble learning methods named (ExtraImpute) to tackle numerical missing values in healthcare context. The proposed method has the ability to impute both continuous and discrete data features. This approach imputes each missing value that exists in features by predicting its value using other observed values in the dataset. To evaluate the efficiency of our algorithm, several experiments are conducted on five different benchmark healthcare datasets and compared to other commonly used imputation methods, viz. missForest, KNNImpute, Multivariate Imputation by Chained Equations (MICE), and SoftImpute. The results were validated using Root Mean Square Error (RMSE) and Coefficient of Determination (R2) scores. From these results, it was observed that our proposed algorithm outperforms existing imputation techniques.


Download File

Full text not available from this repository.

Additional Metadata

Item Type: Article
Divisions: Faculty of Computer Science and Information Technology
DOI Number: https://doi.org/10.12720/jait.13.5.470-476
Publisher: Engineering and Technology Publishing
Keywords: Extra trees; Healthcare; Imputation; Missing values
Depositing User: Ms. Che Wa Zakaria
Date Deposited: 06 Oct 2023 23:13
Last Modified: 06 Oct 2023 23:13
Altmetrics: http://www.altmetric.com/details.php?domain=psasir.upm.edu.my&doi=10.12720/jait.13.5.470-476
URI: http://psasir.upm.edu.my/id/eprint/101446
Statistic Details: View Download Statistic

Actions (login required)

View Item View Item