Citation
Silalahi, Divo Dharma
(2021)
Development of robust procedures for partial least square regression with application to near infrared spectral data.
Doctoral thesis, Universiti Putra Malaysia.
Abstract
The Partial Least Square Regression (PLSR) is a multivariate method
commonly used to build a predictive model of Near Infrared (NIR) spectral data.
Based on our experience, several weaknesses of the PLSR have been
identified with respect to its robustness issues in the pre-processing and inprocessing
when outliers and High Leverage Points (HLP) exist in the dataset.
In addressing these problems, some robust procedures for PLSR are
developed.
In the pre-processing, the pretreatment procedure is needed to remove both
additive and multiplicative baseline effects and to distinguish the scattering
effect in the raw spectral. The existing methods are not very successful in
removing those effects. Hence, a new robust Generalized Multiplicative Scatter
Correction (GMSC) algorithm is proposed to correct the additive and/or
multiplicative baseline effects during pre-processing spectra. The results
indicate that the proposed method outperforms the existing methods in this
study.
In the in-processing, the PLSR model is very sensitive to the optimal number of
PLS components used in the model fitting process. Several selection
procedures of the optimal number of PLS components have been developed in
this regard. However, each procedure yields different result. To date, no one
has been able to determine the more superior method. Hence, a Robust
Reliable Weighted Average (RRWA-PLS) which does not require the selection
of an optimal number of PLS is developed by employing the weighted average
strategy from multiple PLSR models generated by different complexity of the
PLS components. In the PLSR model there is no variable selection procedure
that able to remove the irrelevant wavelengths. To fill-in the gap in the literature, a new robust procedure in wavelength selection based on input
scaling method is developed using Filter-Wrapper method. The PLSR fails to
discover the nonlinear structure in the original input space. As such, the use of
the classical PLSR might not be appropriate. In addition, the contamination of
outliers and HLP in the dataset also might damage the whole data processing
procedures. To address these problems, robust nonlinear solutions of PLSR
are developed through kernel based learning by nonlinearly projecting the
original input data matrix to a high dimensional feature mapping corresponding
to the kernel space. The nonlinear solutions coupled with some improved
robust methods such as Diagnostic Robust Generalized Potential (DRGP)
method and GM6-Estimator are also introduced.
Several statistical measures such as Root Mean Squared Error (RMSE),
Coefficient of Determination (R2), Ratio of Performance to Deviation (RPD), and
Standard Error (SE) are used to evaluate the superiority of the proposed
methods. The results of the simulation study and two NIR spectral data sets,
namely the NIR spectral of oil palm (Elaeis guineensis Jacq.) fresh and dried
ground fruit mesocarp, show that all the proposed methods are superior
compared to the existing methods in this study.
Download File
Additional Metadata
Item Type: |
Thesis
(Doctoral)
|
Subject: |
Regression analysis |
Subject: |
Least squares |
Call Number: |
IPM 2021 8 |
Chairman Supervisor: |
Professor Habshah Binti Midi, PhD |
Divisions: |
Institute for Mathematical Research |
Keywords: |
Near Infrared, Spectral Data, Partial Least Squares, Generalized
Multiplicative Scatter Correction, Average-Weighted, Number of Components,
Reliability Coefficients, Variable Selection, Variable Importance Projection,
Uninformative Variable Eliminations, Nonlinear, Kernel, Hilbert-Space, GM6-
Estimator, Diagnostic Robust Generalized Potential. |
Depositing User: |
Ms. Nur Faseha Mohd Kadim
|
Date Deposited: |
19 Sep 2022 23:47 |
Last Modified: |
19 Sep 2022 23:47 |
URI: |
http://psasir.upm.edu.my/id/eprint/98710 |
Statistic Details: |
View Download Statistic |
Actions (login required)
|
View Item |