Citation
Baba, Ishaq Abdullahi
(2022)
Robust diagnostics and variable selection procedure based on modified reweighted fast consistent and high breakdown estimator for high dimensional data.
Doctoral thesis, Universiti Putra Malaysia.
Abstract
The reweighted fast, consistent and high breakdown (RFCH) estimator is a multivariate
procedure used to estimate the robust location and scatter matrix. It is incorporated
in the robust Mahalanobis distance to detect the presence of high leverage
points in a dataset. The method showed excellent performance compared to its competitors. However, it cannot be applied when the sample size is less than the number
of predictor variables. In addressing this problem, some robust procedures for high
dimensional dataset via the RFCH algorithm are developed.
A modified reweighted fast consistent and high breakdown (MRFCH) estimator in
high dimensional data based on the diagonal elements of the scatter matrix instead
of its entire elements in the computation of robust Mahalanobis distance within the
RFCH algorithm is developed. The proposed method inherits the robustness properties
of the original RFCH estimators. Simulation results and artificial data examples
showed that the proposed MRFCH is more efficient and faster than the MRCD and
OGK estimators.
Outlier detection and classification are critical issues that affect prediction accuracy
if not handled correctly. Mahalanobis distance (MD) measure is one of the most
popular multivariate analysis tools used to detect multivariate outlying observations.
However, the traditional MD based on the classical mean and covariance rarely identifies
all the multivariate outliers in a given dataset, which gives rise to the masking
and swamping problems. Therefore, the robust location and covariance matrix based
on the MRFCH is used instead of the classical estimators to tackle these problems.
The proposed algorithm has been applied to detect outliers in the high dimensional
data. The results obtained from the simulation study and real data sets indicate that
the proposed method possesses high detection power with minimal misclassification
error compared to the MRCD and MDP methods.
The classical correlation estimators that employ the sample mean of the dependent
and independent variables are known to be affected by outliers. Therefore, the robust
weighted correlation coefficient that can reduce the effect of outliers is proposed.
The weights based on the RD (MRFCH) are incorporated in establishing the proposed
robust correlation to solve the problems. The performance of the proposed
method is illustrated using simulation study and on glass vessel data with 1920 variables,
cardiomyopathy microarray data with 6319 variables, and octane data with
226 dimensions. The results show that the robust weighted correlation based on
RD (MRFCH) is more powerful and efficient than the existing methods, irrespective
of dimension, sample size, and contamination levels.
Sure screening-based correlation methods are popular tools used to select the most
significant variables in the true model in sparse and high dimensional analysis. However,
in practice, high leverage points may lead to misleading results in solving variable
selection problems. Therefore, a robust sure independence screening procedure
based on the weighted correlation algorithm of MRFCH for high dimensional data
is developed to address this problem. The simulation study results and real data sets
indicate that the proposed MRFCHCS+LAD-SCAD estimator was found to be the
best method compared to other methods in this study.
Download File
Additional Metadata
Actions (login required)
|
View Item |