Cold deck missing value imputation with a trust-based selection method of multiple web donors

Citation

Mohd Jaya, Mohd Izham (2018) Cold deck missing value imputation with a trust-based selection method of multiple web donors. Doctoral thesis, Universiti Putra Malaysia.

Abstract

Missing value is a common problem in any dataset and its occurrence decreases data completeness as data values are missing. Moreover, the problem reduces data quality and negatively impacted the result of data analysis. Existing cold deck imputation coped with this problem by selecting a replacement value from a pool of donors identified in other data sources during the imputation process. In comparison to other imputation methods, existing cold deck imputation has less risk on model misspecification and preserves data distribution in the dataset. Nevertheless, the limitation of the existing cold deck imputation is the chances in finding trusted plausible donor is narrow due to a usage of single data source in each imputation process. The availability of various web data sources today alleviates this limitation. However, as values from multiple web data sources are commonly conflicted to each other, adopting existing cold deck imputation with multiple web donors is not a practical solution as trust score on each of the conflicted values is not measured. Thus, it is difficult to select the most plausible value during imputation process. This research concentrates on improving data completeness by imputing missing values using a trust based cold deck imputation. Trust Based Cold Deck Missing Values Imputation with Multiple Web Donor is presented in this research. The proposed method takes advantage of multiple web donors from web data sources in order to provide higher chances in finding the most plausible values to impute missing values. The plausible values are selected based on the trust score computation’s novelty which is measured by accuracy score and reliability score of the web donor. The performance of the proposed method is evaluated by running a prediction model on the imputed dataset. A number of experiments are carried out to quantify the accuracy of the prediction model, Root Mean Squared Error (RMSE), and the F-Measure. The results demonstrate that the proposed method improves the performance of existing cold deck imputation. Additionally, the results are then compared with other imputation methods which are K-Nearest Neighbor (KNN), Mean Imputation (AVG), Case Deletion (IGN), Predictive Mean Matching (PMM) and MissForest. The results showed that the RMSE, prediction accuracy and FMeasure is improved when the prediction model is trained with datasets imputed using the proposed method. This research contributed to the improvement of data quality especially to the information system (IS) and database field where good data quality benefited the data analysis performance.

Download File

Text
FSKTM 2018 79 -ir.pdf
Download (622kB)

Additional Metadata

Item Type:	Thesis (Doctoral)
Subject:	Mathematical statistics
Subject:	Missing observations (Statistics)
Call Number:	FSKTM 2018 79
Chairman Supervisor:	Assoc. Prof. Fatimah binti Sidi, PhD
Divisions:	Faculty of Computer Science and Information Technology
Depositing User:	Ms. Nur Faseha Mohd Kadim
Date Deposited:	08 Sep 2020 04:09
Last Modified:	07 Jan 2022 08:33
URI:	http://psasir.upm.edu.my/id/eprint/83236
Statistic Details:	View Download Statistic

Actions (login required)

View Item