UPM Institutional Repository

Robust diagnostics and variable selection procedure based on modified reweighted fast consistent and high breakdown estimator for high dimensional data


Citation

Baba, Ishaq Abdullahi (2022) Robust diagnostics and variable selection procedure based on modified reweighted fast consistent and high breakdown estimator for high dimensional data. Doctoral thesis, Universiti Putra Malaysia.

Abstract

The reweighted fast, consistent and high breakdown (RFCH) estimator is a multivariate procedure used to estimate the robust location and scatter matrix. It is incorporated in the robust Mahalanobis distance to detect the presence of high leverage points in a dataset. The method showed excellent performance compared to its competitors. However, it cannot be applied when the sample size is less than the number of predictor variables. In addressing this problem, some robust procedures for high dimensional dataset via the RFCH algorithm are developed. A modified reweighted fast consistent and high breakdown (MRFCH) estimator in high dimensional data based on the diagonal elements of the scatter matrix instead of its entire elements in the computation of robust Mahalanobis distance within the RFCH algorithm is developed. The proposed method inherits the robustness properties of the original RFCH estimators. Simulation results and artificial data examples showed that the proposed MRFCH is more efficient and faster than the MRCD and OGK estimators. Outlier detection and classification are critical issues that affect prediction accuracy if not handled correctly. Mahalanobis distance (MD) measure is one of the most popular multivariate analysis tools used to detect multivariate outlying observations. However, the traditional MD based on the classical mean and covariance rarely identifies all the multivariate outliers in a given dataset, which gives rise to the masking and swamping problems. Therefore, the robust location and covariance matrix based on the MRFCH is used instead of the classical estimators to tackle these problems. The proposed algorithm has been applied to detect outliers in the high dimensional data. The results obtained from the simulation study and real data sets indicate that the proposed method possesses high detection power with minimal misclassification error compared to the MRCD and MDP methods. The classical correlation estimators that employ the sample mean of the dependent and independent variables are known to be affected by outliers. Therefore, the robust weighted correlation coefficient that can reduce the effect of outliers is proposed. The weights based on the RD (MRFCH) are incorporated in establishing the proposed robust correlation to solve the problems. The performance of the proposed method is illustrated using simulation study and on glass vessel data with 1920 variables, cardiomyopathy microarray data with 6319 variables, and octane data with 226 dimensions. The results show that the robust weighted correlation based on RD (MRFCH) is more powerful and efficient than the existing methods, irrespective of dimension, sample size, and contamination levels. Sure screening-based correlation methods are popular tools used to select the most significant variables in the true model in sparse and high dimensional analysis. However, in practice, high leverage points may lead to misleading results in solving variable selection problems. Therefore, a robust sure independence screening procedure based on the weighted correlation algorithm of MRFCH for high dimensional data is developed to address this problem. The simulation study results and real data sets indicate that the proposed MRFCHCS+LAD-SCAD estimator was found to be the best method compared to other methods in this study.


Download File

[img] Text
ISHAQ ABDULLAHI BABA - IR.pdf

Download (804kB)

Additional Metadata

Item Type: Thesis (Doctoral)
Subject: Algorithms
Subject: Robust control
Call Number: IPM 2022 4
Chairman Supervisor: Prof. Habshah Midi, PhD
Divisions: Faculty of Science
Depositing User: Ms. Rohana Alias
Date Deposited: 05 Oct 2023 06:36
Last Modified: 05 Oct 2023 06:36
URI: http://psasir.upm.edu.my/id/eprint/104718
Statistic Details: View Download Statistic

Actions (login required)

View Item View Item