Citation
Jamalai@Jamali, Siti Nurliana
(2024)
Tangible interaction learning model to enhance learning activity processes among children with dyslexia.
Doctoral thesis, Universiti Putra Malaysia.
Abstract
Missing data is a widespread data quality issue across various domains. A common
challenge is the occurrence of missing data during the data input process. Numerous
studies have proposed methods to impute missing values for data across multiple
fields. However, certain domains present unique challenges due to the involvement of
attributes from multiple scientific disciplines, such as biology, chemistry, and medical
which complicates the imputation process. Current machine learning models struggle
to handle both missing values and inaccuracies simultaneously, particularly when
dealing with large datasets. These challenges are further compounded by the data type
constraints imposed by these algorithms. Furthermore, most of the current approaches
focused on the imputation method alone without giving enough attention to the
cleansing and pre-processing phase which can be crucial for the imputation method
mechanism. Besides that, software tools for applying missing data imputation
approaches are limited. Hence, there is a need for the inclusion of intelligence
approaches in data imputation in the case of determining which independent variables
are the best set to impute missing values in dependent variables. To find optimum
variables, Machine Learning approach needs to be utilized. In this research, an
imputation approach using Extremely Randomized Trees (Extra Trees) of ensemble
machine learning methods named (ImputeX) is proposed. This method has the ability
to impute both categorical and continuous data features for large datasets. In addition,
an application is presented for public users to utilize the proposed method using
standard and autonomous data imputation. The proposed imputation method was
compared with existing imputation methods including MissForest, K-NNI,
HyperImpute, Multivariate Imputation by Chained Equations (MICE), Multiple
Imputation with Denoising Autoencoders (MIDAS), and SoftImpute. From these
results, it was observed that the proposed method improves the execution time by 35%
compared to recent imputation methods and increases the accuracy by 0.5% at 10%
missing ratio reaching 15% of accuracy improvement at 90% missing ratio. While the
presented application has achieved the best performance compared to current software
tools such as R package, Statistical Package for the Social Sciences (SPSS), Stata, and
Microsoft Excel. The significance of this research is to develop an intelligent method
that can deal with both missing values and accuracy in large datasets while minimizing
time consumed. Through the presentation of an accurate and reliable imputation
method, this research helps to improve data quality. Additionally, it contributes to data
science by improving the data cleaning procedure, which is a step in the data preprocessing
stage.
Download File
Additional Metadata
Item Type: |
Thesis
(Doctoral)
|
Subject: |
Dyslexic children - Education |
Subject: |
Malay language - Study and teaching (Primary) |
Subject: |
Human-computer interaction |
Call Number: |
FSKTM 2024 14 |
Chairman Supervisor: |
Associate Professor Ts. Fatimah binti Sidi, PhD |
Divisions: |
Faculty of Computer Science and Information Technology |
Keywords: |
Extra Trees, Imputation, Missing Data, Machine Learning, Data Quality |
Depositing User: |
Ms. Rohana Alias
|
Date Deposited: |
09 Oct 2025 08:37 |
Last Modified: |
09 Oct 2025 08:37 |
URI: |
http://psasir.upm.edu.my/id/eprint/120146 |
Statistic Details: |
View Download Statistic |
Actions (login required)
 |
View Item |