Citation
Goh, Rui Ying
(2019)
Hybrid harmony search-artificial intelligence models in credit scoring.
Masters thesis, Universiti Putra Malaysia.
Abstract
Credit is a type of advanced lending which poses the risk of having default payments.
Thus, credit scoring is important to correctly identify defaulters and non-defaulters.
Statistical models are the main approaches but recently, Artificial Intelligence (AI) techniques
have been popular due to their ability to account for flexible data patterns. Support Vector
Machines (SVM) and Random Forest (RF) are the main focus in this study due to their competitiveness
in the literature.
This study focuses to improve three main drawbacks of both AI techniques i.e.
sensitivity to hyperparameters, the black-box property and increased computational effort due to
hyperparameters tuning procedure. Employment of hyperparameters tuning have been a common
practice for both SVM and RF in ensuring quality performance. Instead of the conventional
Grid Search (GS) and manual tuning (MT) approaches, automated tuning with metaheuristics approach
(MA) have also shown to be effective in this task. Genetic Algorithm (GA) has been the dominant
method and other MA being attempted recently has shown the potential of MA to perform
hyperparameters tuning. To the best of our knowledge, Harmony Search (HS) has yet to be utilized
with SVM and RF in this domain.
To utilize the SVM credit model, features selection is conducted simultaneously with
hyperparameters tuning using a HS so that the attributes can be focused down to the reduced
features for explanation. For the RF credit model, a HS is hybridized with RF for hyperparameters
tuning. Then, the two types of features importance computed from RF algorithm are utilized
for the attributes explanation. Due to
the increased computational effort from HS-SVM and HS-RF, a modified HS (MHS) hybridized with SVM and RF are proposed in this study for an effective yet efficient search.
There are four main modifications of the MHS hybrid models i.e. elitism selection instead of random
selection, dynamic exploration and exploitation operators following step functions instead of
a static value, replacement of the bandwidth with coefficient of variations and two
additional termination criteria included. To further enhance the computational efficiency, the
MHS hybrid models are parallelized.
The four hybrid models are evaluated by comparing with standard statistical models across three
datasets i.e. German and Australian credit datasets from the public repository as well
as a peer-to-peer (P2P) lending data from Lending Club (LC) website to account for
different credit data patterns. The discussions are based on discriminating ability, model
explainability and computational time.
All the hybrid models have achieved higher discriminating ability than GS-tuned models. RF
hybrid models consistently show better discriminating ability compared to other methods across the
three datasets. Compared to SVM hybrids, RF hybrids achieved approximately 1% improvement in
German and Australian data, and around 4% improvement in LC dataset. This study also
demonstrates model explainability using reduced features for MHS-SVM and features
importance for MHS-RF. It is shown that these strategies are useful to obtain initial information
on the attributes. For both German and Australian datasets, reduced features and features
importance have directed almost the same features as ‘important’. For LC dataset, end results
shows only one attribute in common for both strategies. This is believed to be due to the
different approaches of both classifiers in capturing data pattern for classification.
In terms of computational time, compared to GS-tuned models and the respective HS hybrids,
the proposed hybrid MHS-SVM and MHS-RF have reported time improvement of more than 50%, while the
parallel computation have saved up approximately 80% of the computational time.
In addition, hybrid models with MHS have reduced the computational effort yet
maintaining the good discriminating ability. With the parallelization of MHS hybrid models, the
computational time is effectively reduced, with RF hybrid models faster than SVM hybrid models.
Although statistical models are efficient as no hyperpa- rameters tuning procedure is involved,
their inferior performance compared to the AI models in this study indicates the failure to capture
information from the LC dataset. In terms of model performance, explainability and computational
effort, MHS-RF is
the recommended credit scoring model due to its robustness in the three aspects.
Download File
Additional Metadata
Actions (login required)
|
View Item |