UPM Institutional Repository

Robust variable selection methods for large- scale data in the presence of multicollinearity, autocorrelated errors and outliers


Citation

Uraibi, Hassan S. (2016) Robust variable selection methods for large- scale data in the presence of multicollinearity, autocorrelated errors and outliers. Doctoral thesis, Universiti Putra Malaysia.

Abstract

The robust correlation coefficient based on robust multivariate location and scatter matrix such as Fast Minimum Covariance Determinant (Fast MCD) is not feasible option for high dimensional data due to its time consuming procedure. To overcome this problem, robust adjusted Winsorization correlation (Adj.Winso.cor) is put forward. Unfortunately, the Adj.Winso.cor yields very poor results in the presence of multivariate outliers. Hence, we propose robust multivariate correlation matrix based on Reweighted Fast Consistent and High breakdown (RFCH) estimator. The findings show that the RFCH.cor is more robust than the Adj.Winso.cor in the presence of multivariate outliers. Forward selection (FS) is very effective variable selection procedure for selecting a parsimonious subset of covariates from a large number of candidate covariates. However, FS is not robust to outliers. Robust forward selection method (FS.Winso) based on partial correlations which is derived from Maronna’s bivariate M-estimator of scatter matrix and adjusted Winsorization pairwise correlation are introduced in a literatures to overcome the problem of outliers. We develop Robust Forward Selection algorithm based on RFCH correlation coefficient (RFS.RFCH) because FS.Winso is not robust to multivariate outliers. The results of our study indicate that the RFS.RFCH is more efficient than the FS and FS.Winso. The existing Robust-LARS based on Winsorization correlation (RLARS-Winsor) has some drawbacks whereby it is not robust in the presence of multivariate outliers. Hence, Robust-LARS (RLARS-RFCH) based on √ consistent multivariate (RFCH) correlation matrix is developed. The proposed method is computationally efficient and its performance outperformed the RLARS-Winsor The algorithm of all possible subsets is greedy and it is inefficient and unstable in the presence of autocorrelated errors and outliers. To overcome the instability selection problem, a stability selection approach is put forward to enhance the performance of single-split variable selection method. Unfortunately, the classical stability selection procedure is very sensitive to outliers and serially correlated errors. The stability procedure based on RFCH estimator is therefore developed. The results of the study show that our propose Robust Multi Split based on RFCH successfully and consistently select the correct variables in the final model. Thus far, there is no variable selection procedure in literature that deal with the problem of high magnitude of multicollinearity in the presence of outliers. Hence, Robust Non- Grouped variable selection(RNGVS.RFCH) in the presence of high multicollinearity problem and outliers is developed. The results signify that our proposed RNGVS.RFCH method able to correctly select the important variables in the final model. Not much research is focused on the problem of large data in the presence of outliers and autocorrelated errors. In this situation, the existing Elastic-Net and RE-Net methods are not capable of selecting the important variables in the final model. Thus, a new method that we call before and after elastic-net (BAE-Net) regression is proposed. The Reweighted Multivariate Normal (RMVN) algorithm is incorporated in the algorithm of the BAE-Net. The BAE-Net is found to do a credible job in selecting the correct important variables in the final model.


Download File

[img]
Preview
Text
IPM 2016 5 - IR.pdf

Download (2MB) | Preview

Additional Metadata

Item Type: Thesis (Doctoral)
Subject: Robust statistics
Subject: Outliers (Statistics)
Subject: Multicollinearity
Call Number: IPM 2016 5
Chairman Supervisor: Professor Habshah Midi, PhD
Divisions: Faculty of Science
Depositing User: Ms. Nur Faseha Mohd Kadim
Date Deposited: 29 Oct 2019 06:54
Last Modified: 29 Oct 2019 06:54
URI: http://psasir.upm.edu.my/id/eprint/69762
Statistic Details: View Download Statistic

Actions (login required)

View Item View Item