UPM Institutional Repository

Robust diagnostics and parameter estimation in linear regression for high dimensional data


Citation

Abdul Wahab, Siti Zahariah (2023) Robust diagnostics and parameter estimation in linear regression for high dimensional data. Doctoral thesis, Universiti Putra Malaysia.

Abstract

Several methods of identification of HLPs in HDD have been put forth, including the methods of Robust Mahalanobis Distance (RMD) based on Minimum Regularized Covariance Determinant (MRCD) and Robust Principal Component Analysis (ROBPCA). However, they suffer from masking and swamping effects when the predictor variables are at least 700. In addressing this problem, a modification of HLPs detection method called Robust Mahalanobis Distance based on the combination of the Minimum Regularized Covariance Determinant and Principal Component Analysis (RMD-MRCD-PCA) is proposed. Empirical evidence from simulation studies and real data show that the RMD-MRCD-PCA method is very successful in the detection HLPs with negligible masking and swamping effects. Numerous classical methods, such as leave-one-out cross-validation (LOOCV) and Kfold cross-validation (K-FoldCV) are developed to determine the optimal number of PLS components. Nonetheless, they are easily affected by HLPs. Thus, robust cross validation techniques, denoted as RMD-MRCD-PCA-LOOCV and RMD-MRCD-PCAK- FoldCV are proposed to remedy this problem. The results of the simulation study and real data sets indicate that the proposed methods successfully select the appropriate number of PLS components. The statistically inspired modification of partial least squares (SIMPLS) is the popular method to deal with multicollinearity in high dimensional data. Nonetheless, the SIMPLS method is vulnerable to the existence of HLPs. Hence, the robust weight based on RMD-MRCD-PCA of SIMPLS (RMD-MRCD-PCA-RWSIMPLS) is established to overcome this issue. Simulation experiments and real examples have demonstrated that the RMD-MRCD-PCA-RWSIMPLS is more efficient than the SIMPLS and the RWSIMPLS methods. Partial least squares discriminant analysis (PLSDA) is the popular classifier for HDD. Nevertheless, the PLSDA is easily affected by the presence of HLPs. Hence, a robust weighted partial least squares discriminant analysis based on the weighting function of RMD-MRCD-PCA (RMD-MRCD-PCA-RWPLSDA) is proposed to close the gap in the literature. The results of the simulation study and real datasets show that the RMDMRCD- PCA-RWPLSDA method successfully and efficiently classifies the data into binary and multiple groups. Hotelling T2 based on PLS (T2-PLS) method has been proposed for variable selection technique in HDD. However, the T2-PLS is not resistant to the HLPs. To rectify this issue, the robust Hotelling T2 variable selection method, which is based on the RMDMRCD- PCA-RWSIMPLS, is proposed. The results of simulation study and real datasets indicate that the T2-RMD-MRCD-PCA-RWSIMPLS method successfully selects appropriate number of important variables to be included in the model with the least value of mean square error.


Download File

[img] Text
118358.pdf

Download (1MB)
Official URL or Download Paper: http://ethesis.upm.edu.my/id/eprint/18371

Additional Metadata

Item Type: Thesis (Doctoral)
Subject: High-dimensional data
Subject: Robust statistics
Subject: Linear models (Statistics)
Call Number: :IPM 2023 12
Chairman Supervisor: Profesor Habshah binti Midi, PhD
Divisions: Institute for Mathematical Research
Depositing User: Ms. Rohana Alias
Date Deposited: 04 Aug 2025 06:14
Last Modified: 04 Aug 2025 06:14
URI: http://psasir.upm.edu.my/id/eprint/118358
Statistic Details: View Download Statistic

Actions (login required)

View Item View Item