UPM Institutional Repository

Robust correlation feature selection based support vector machine approach for high dimensional datasets


Citation

Baba, Ishaq Abdullahi and Mohammed, Mohammed Bappah and Jillahi, Kamal Bakari and Umar, Aliyu and Hendi, Hasan Talib (2025) Robust correlation feature selection based support vector machine approach for high dimensional datasets. Results in Control and Optimization, 21. art. no. 100609. pp. 1-14. ISSN 2666-7207

Abstract

Correlation-based feature selection methods are popular tools used to select the most important variables to include the true model in the analysis of sparse and high-dimensional models. In application, the presence of anomalous observations in both predictors and responses can seriously jeopardize the prediction accuracy of the model, which in turn leads to misleading interpretations and conclusions if not correctly addressed. Furthermore, the cause of dimensionality is another serious difficulty facing many existing feature selection algorithms. To achieve more reliable feature selection and prediction accuracy, a weighted sure independence screening-based support vector machine for high-dimensional datasets is proposed. The key contribution of our proposed method is that it minimizes the influence of outliers in differentiating between significant and insignificant features and improves predictability and interpretability. Our method consists of three basic steps. In the first step, a weights-based modified reweighted fast, consistent, and high break-down point is computed. The second step utilizes the estimates of weights from the first step to select the most important variables for the model. The third step employs the support vector machine algorithm to calculate prediction values. To demonstrate the effectiveness of the developed procedure, we used both simulation and real-life data examples. Our results show that the proposed methods performs better with a clear margin compared to other procedures.


Download File

[img] Text
120119.pdf - Published Version
Available under License Creative Commons Attribution Non-commercial No Derivatives.

Download (2MB)

Additional Metadata

Item Type: Article
Divisions: Institute for Mathematical Research
DOI Number: https://doi.org/10.1016/j.rico.2025.100609
Publisher: Elsevier B.V.
Keywords: Correlation; Feature selection; High dimensional data; Outliers; Support vector machine
Depositing User: Mohamad Jefri Mohamed Fauzi
Date Deposited: 23 Sep 2025 07:29
Last Modified: 23 Sep 2025 07:29
Altmetrics: http://www.altmetric.com/details.php?domain=psasir.upm.edu.my&doi=10.1016/j.rico.2025.100609
URI: http://psasir.upm.edu.my/id/eprint/120119
Statistic Details: View Download Statistic

Actions (login required)

View Item View Item