Citation
Abdul Wahab, Siti Zahariah
(2023)
Robust diagnostics and parameter estimation in linear regression for high dimensional data.
Doctoral thesis, Universiti Putra Malaysia.
Abstract
Several methods of identification of HLPs in HDD have been put forth, including the
methods of Robust Mahalanobis Distance (RMD) based on Minimum Regularized
Covariance Determinant (MRCD) and Robust Principal Component Analysis
(ROBPCA). However, they suffer from masking and swamping effects when the
predictor variables are at least 700. In addressing this problem, a modification of HLPs
detection method called Robust Mahalanobis Distance based on the combination of the
Minimum Regularized Covariance Determinant and Principal Component Analysis
(RMD-MRCD-PCA) is proposed. Empirical evidence from simulation studies and real
data show that the RMD-MRCD-PCA method is very successful in the detection HLPs
with negligible masking and swamping effects.
Numerous classical methods, such as leave-one-out cross-validation (LOOCV) and Kfold
cross-validation (K-FoldCV) are developed to determine the optimal number of PLS
components. Nonetheless, they are easily affected by HLPs. Thus, robust cross
validation techniques, denoted as RMD-MRCD-PCA-LOOCV and RMD-MRCD-PCAK-
FoldCV are proposed to remedy this problem. The results of the simulation study and
real data sets indicate that the proposed methods successfully select the appropriate
number of PLS components.
The statistically inspired modification of partial least squares (SIMPLS) is the popular
method to deal with multicollinearity in high dimensional data. Nonetheless, the
SIMPLS method is vulnerable to the existence of HLPs. Hence, the robust weight based
on RMD-MRCD-PCA of SIMPLS (RMD-MRCD-PCA-RWSIMPLS) is established to
overcome this issue. Simulation experiments and real examples have demonstrated that
the RMD-MRCD-PCA-RWSIMPLS is more efficient than the SIMPLS and the
RWSIMPLS methods.
Partial least squares discriminant analysis (PLSDA) is the popular classifier for HDD.
Nevertheless, the PLSDA is easily affected by the presence of HLPs. Hence, a robust
weighted partial least squares discriminant analysis based on the weighting function of
RMD-MRCD-PCA (RMD-MRCD-PCA-RWPLSDA) is proposed to close the gap in the
literature. The results of the simulation study and real datasets show that the RMDMRCD-
PCA-RWPLSDA method successfully and efficiently classifies the data into
binary and multiple groups.
Hotelling T2 based on PLS (T2-PLS) method has been proposed for variable selection
technique in HDD. However, the T2-PLS is not resistant to the HLPs. To rectify this
issue, the robust Hotelling T2 variable selection method, which is based on the RMDMRCD-
PCA-RWSIMPLS, is proposed. The results of simulation study and real datasets
indicate that the T2-RMD-MRCD-PCA-RWSIMPLS method successfully selects
appropriate number of important variables to be included in the model with the least
value of mean square error.
Download File
Additional Metadata
Actions (login required)
 |
View Item |