Citation
Dadkhah, Kourosh
(2010)
Robust Kernel Density Function Estimation.
PhD thesis, Universiti Putra Malaysia.
Abstract
The classical kernel density estimation technique is the commonly used method to estimate the density function. It is now evident that the accuracy of such density function estimation technique is easily affected by outliers. To remedy this problem, Kim and Scott (2008) proposed an Iteratively Re-weighted Least Squares (IRWLS) algorithm for Robust Kernel Density Estimation (RKDE). However, the weakness of IRWLS based estimator is that its computation time is very long. The shortcoming of such RKDE has inspired us to propose new non-iterative and unsupervised based approaches which are faster, more accurate and more flexible. The proposed estimators are based on our newly developed Robust Kernel Weight Function (RKWF) and Robust Density Weight Function (RDWF). The basic idea of RKWF based method is to first define a function which measures the outlying distance of observation. The resultant distances are manipulated to obtain the robust weights. The statement of Chandola et al. (2009) that the normal (clean) data appear in high probability area of stochastic model, while the outliers appear in low probability area of stochastic model, has motivated us to develop RDWF. Based on
this notion, we employ the pilot (preliminary) estimate of density function as initial similarity (or distance) measure of observations with the neighbours. The modified
similarity measures produce the robust weights to estimate density function robustly. Subsequently, the robust weights are incorporated in the kernel function to formulate the robust density function estimation. An extensive simulation study has been carried out to assess the performance of the RKWF-based estimator and RDWF-based estimator. The RKDE based on RKWF and RDWF perform as good as the classical Kernel Density Estimator (KDE) in outlier free data sets.
Nonetheless, their performances are faster, more accurate and more reliable than the IRWLS approach for contaminated data sets.
The classical kernel density function estimation approach is widely used in various formula and methods. Unfortunately, many researchers are not aware that the KDE
is easily affected by outliers. We have proposed the RKDE which is more efficient and consumes less time. Our work on RKDE or corresponding robust weights has motivated us to develop alternative location and scale estimators. A modification is made to the classical location and scale estimator by incorporating the robust weight and RKDE. To evaluate the efficiency of the proposed method, comprehensive contaminated models are designed and simulated. The accuracy of the proposed new method was compared with the location and scale estimators based on M.
Minimum Covariance Determinant (MCD) and Minimum Volume Ellipsoid (MVE) estimator. The simulation study demonstrates that, on the whole, the accuracy of the
proposed method is better than the competitor methods.
The research also develops two new approaches for outlier and potential outlier detection in unimodal and multimodal distributions. The distance of observations from the center of data set is incorporated in the formulation of the first outlier detection method in unimodal distribution. The second method attempts to define an approach that is useable not only for unimodal distribution but also for multimodal distribution. This approach incorporates robust weights, whereby, high weights and low weights are assigned to normal (clean) and outlying observations, respectively.
In this thesis, we also illustrate that the sensitivity of RKDE depends on the setting of the tuning constants of the employed loss function. The results of the study indicate that the proposed methods are capable of labelling normal observation and potential outliers in a data set. Additionally, they are able to assign anomaly scores
to normal and outlying observations.
Finally this thesis also addresses the estimation of Mutual Information (MI) for mixture distribution which prone to create two distant groups in the data. The formulation of MI involves estimation of density function. Mutual information estimate for bivariate random variables involves the bivariate density estimation.
The bivariate density estimation employs the estimate of covariance matrix. The sensitivity of covariance matrix to the presence of outliers has motivated us to substitute it with robust estimate derived from MCD and MVE. The efficiency of the modified mutual information estimate is evaluated based on its accuracy. To do this evaluation, the mixtures of bivariate normal distribution with different
percentage of contribution are simulated. Simulation results show that the new formulation of MI increases the accuracy of mutual information estimation.
Download File
Additional Metadata
Actions (login required)
|
View Item |