Citation
Mat Radzi, Siti Fairuz
(2021)
Effectiveness of tree-based pipeline optimization tools and grid search method in breast cancer prediction.
Masters thesis, Universiti Putra Malaysia.
Abstract
Breast cancer has been known as the most prevalent and common cause
of death among Malaysian woman especially over the age of 40. Breast
cancer can usually be identified as either benign or malignant with
invasive biopsy procedure. The treatment protocol is allocated based on
the whether the mass is benign or malignant. Fortunately, breast cancer
like many other cancer types are curable and patient survival can be
improved, subject to early diagnosis. Radiograph images lies numbers
of features that useful for computer aided diagnosis. In this thesis, the
work is divided into two main phases; 1) evaluating the reproducibility
of radiomics features derived from manual delineation and
semiautomatic segmentation after two different contrast enhancement
techniques on masses in two-dimensional (2D) mammography images
and 2) to implement the Automated Machine Learning (AutoML) in
classifying types of mass in mammogram images. With introduction of
ML techniques, breast cancer can be diagnosed in early stage without
any invasive and risky procedure. The methodology presented in this
research consist of several stages including, image acquisition, image
segmentation, feature extraction/selection and, classification using
AutoML. The first phase determines the reproducibility between
Contrast Limited Adaptive Histogram Equalization (CLAHE) and
Adaptive Histogram Equalization (AHE) techniques. The
semiautomatic segmentation techniques used in the first phase is Active
Contour Method (ACM) with 100 iterations. Three types of radiomics
features were extracted including first order, second order and shape
features. 37 features were extracted from each tumor in three different
techniques mentioned: 9 of these were shape-based features, while 28
were texture-based features. Notably the CLAHE group (ICC = 0.890 ±
0.554, p < 0.05) had the highest reproducibility compared to the
features extracted from the AHE group (ICC = 0.850 ± 0.933, p < 0.05)
and manual delineation (ICC = 0.673 ± 0.807, p > 0.05). Therefore, the
segmentation techniques used in the second phase are based on CLAHE
and ACM method. The Principal Component Analysis (PCA) Random
Forest (RF) classification has proved to be the most reliable pipelines
with the lowest complexity in this research with 92% of accuracy, 83%
of precision, 100% of sensitivity, 94% of ROC.
Download File
Additional Metadata
Actions (login required)
|
View Item |