UPM Institutional Repository

Supervised feature selection using principal component analysis


Citation

Rahmat, Fariq and Zulkafli, Zed and Ishak, Asnor Juraiza and Abdul Rahman, Ribhan Zafira and Stercke, Simon De and Buytaert, Wouter and Tahir, Wardah and Ab Rahman, Jamalludin and Ibrahim, Salwa and Ismail, Muhamad (2023) Supervised feature selection using principal component analysis. Knowledge and Information Systems, 66 (3). pp. 1955-1995. ISSN 0219-1377; eISSN: 0219-3116

Abstract

The principal component analysis (PCA) is widely used in computational science branches such as computer science, pattern recognition, and machine learning, as it can effectively reduce the dimensionality of high-dimensional data. In particular, it is a popular transformation method used for feature extraction. In this study, we explore PCA’s ability for feature selection in regression applications. We introduce a new approach using PCA, called Targeted PCA to analyze a multivariate dataset that includes the dependent variable—it identifies the principal component with a high representation of the dependent variable and then examines the selected principal component to capture and rank the contribution of the non-dependent variables. The study also compares the feature selected with that resulting from a Least Absolute Shrinkage and Selection Operator (LASSO) regression. Finally, the selected features were tested in two regression models: multiple linear regression (MLR) and artificial neural network (ANN). The results are presented for three socioeconomic, environmental, and computer image processing datasets. Our study found that 2 of 3 random datasets have more than 50% similarity in the selected features by the PCA and LASSO regression methods. In the regression predictions, our PCA-selected features resulted in little difference compared to the LASSO regression-selected features in terms of the MLR prediction accuracy. However, the ANN regression demonstrated a faster convergence and a higher reduction of error. © The Author(s), under exclusive licence to Springer-Verlag London Ltd., part of Springer Nature 2023.


Download File

Full text not available from this repository.

Additional Metadata

Item Type: Article
Divisions: Faculty of Engineering
DOI Number: https://doi.org/10.1007/s10115-023-01993-5
Publisher: Springer Science and Business Media Deutschland GmbH
Keywords: ANN; Feature selection; LASSO; Principal component analysis; Supervised feature selection; Industry; Innovation and infrastructure; Sustainable cities and communitiesann; Feature selection; Lasso; Principal component analysis; Supervised feature selection
Depositing User: Mohamad Jefri Mohamed Fauzi
Date Deposited: 11 Nov 2024 01:43
Last Modified: 11 Nov 2024 01:43
Altmetrics: http://www.altmetric.com/details.php?domain=psasir.upm.edu.my&doi=10.1007/s10115-023-01993-5
URI: http://psasir.upm.edu.my/id/eprint/110338
Statistic Details: View Download Statistic

Actions (login required)

View Item View Item