UPM Institutional Repository

Development of robust procedures for partial least square regression with application to near infrared spectral data


Citation

Silalahi, Divo Dharma (2021) Development of robust procedures for partial least square regression with application to near infrared spectral data. Doctoral thesis, Universiti Putra Malaysia.

Abstract

The Partial Least Square Regression (PLSR) is a multivariate method commonly used to build a predictive model of Near Infrared (NIR) spectral data. Based on our experience, several weaknesses of the PLSR have been identified with respect to its robustness issues in the pre-processing and inprocessing when outliers and High Leverage Points (HLP) exist in the dataset. In addressing these problems, some robust procedures for PLSR are developed. In the pre-processing, the pretreatment procedure is needed to remove both additive and multiplicative baseline effects and to distinguish the scattering effect in the raw spectral. The existing methods are not very successful in removing those effects. Hence, a new robust Generalized Multiplicative Scatter Correction (GMSC) algorithm is proposed to correct the additive and/or multiplicative baseline effects during pre-processing spectra. The results indicate that the proposed method outperforms the existing methods in this study. In the in-processing, the PLSR model is very sensitive to the optimal number of PLS components used in the model fitting process. Several selection procedures of the optimal number of PLS components have been developed in this regard. However, each procedure yields different result. To date, no one has been able to determine the more superior method. Hence, a Robust Reliable Weighted Average (RRWA-PLS) which does not require the selection of an optimal number of PLS is developed by employing the weighted average strategy from multiple PLSR models generated by different complexity of the PLS components. In the PLSR model there is no variable selection procedure that able to remove the irrelevant wavelengths. To fill-in the gap in the literature, a new robust procedure in wavelength selection based on input scaling method is developed using Filter-Wrapper method. The PLSR fails to discover the nonlinear structure in the original input space. As such, the use of the classical PLSR might not be appropriate. In addition, the contamination of outliers and HLP in the dataset also might damage the whole data processing procedures. To address these problems, robust nonlinear solutions of PLSR are developed through kernel based learning by nonlinearly projecting the original input data matrix to a high dimensional feature mapping corresponding to the kernel space. The nonlinear solutions coupled with some improved robust methods such as Diagnostic Robust Generalized Potential (DRGP) method and GM6-Estimator are also introduced. Several statistical measures such as Root Mean Squared Error (RMSE), Coefficient of Determination (R2), Ratio of Performance to Deviation (RPD), and Standard Error (SE) are used to evaluate the superiority of the proposed methods. The results of the simulation study and two NIR spectral data sets, namely the NIR spectral of oil palm (Elaeis guineensis Jacq.) fresh and dried ground fruit mesocarp, show that all the proposed methods are superior compared to the existing methods in this study.


Download File

[img] Text
IPM 2021 8 - IR.pdf

Download (1MB)

Additional Metadata

Item Type: Thesis (Doctoral)
Subject: Regression analysis
Subject: Least squares
Call Number: IPM 2021 8
Chairman Supervisor: Professor Habshah Binti Midi, PhD
Divisions: Institute for Mathematical Research
Keywords: Near Infrared, Spectral Data, Partial Least Squares, Generalized Multiplicative Scatter Correction, Average-Weighted, Number of Components, Reliability Coefficients, Variable Selection, Variable Importance Projection, Uninformative Variable Eliminations, Nonlinear, Kernel, Hilbert-Space, GM6- Estimator, Diagnostic Robust Generalized Potential.
Depositing User: Ms. Nur Faseha Mohd Kadim
Date Deposited: 19 Sep 2022 23:47
Last Modified: 19 Sep 2022 23:47
URI: http://psasir.upm.edu.my/id/eprint/98710
Statistic Details: View Download Statistic

Actions (login required)

View Item View Item