UPM Institutional Repository

Effectiveness of tree-based pipeline optimization tools and grid search method in breast cancer prediction


Citation

Mat Radzi, Siti Fairuz (2021) Effectiveness of tree-based pipeline optimization tools and grid search method in breast cancer prediction. Masters thesis, Universiti Putra Malaysia.

Abstract

Breast cancer has been known as the most prevalent and common cause of death among Malaysian woman especially over the age of 40. Breast cancer can usually be identified as either benign or malignant with invasive biopsy procedure. The treatment protocol is allocated based on the whether the mass is benign or malignant. Fortunately, breast cancer like many other cancer types are curable and patient survival can be improved, subject to early diagnosis. Radiograph images lies numbers of features that useful for computer aided diagnosis. In this thesis, the work is divided into two main phases; 1) evaluating the reproducibility of radiomics features derived from manual delineation and semiautomatic segmentation after two different contrast enhancement techniques on masses in two-dimensional (2D) mammography images and 2) to implement the Automated Machine Learning (AutoML) in classifying types of mass in mammogram images. With introduction of ML techniques, breast cancer can be diagnosed in early stage without any invasive and risky procedure. The methodology presented in this research consist of several stages including, image acquisition, image segmentation, feature extraction/selection and, classification using AutoML. The first phase determines the reproducibility between Contrast Limited Adaptive Histogram Equalization (CLAHE) and Adaptive Histogram Equalization (AHE) techniques. The semiautomatic segmentation techniques used in the first phase is Active Contour Method (ACM) with 100 iterations. Three types of radiomics features were extracted including first order, second order and shape features. 37 features were extracted from each tumor in three different techniques mentioned: 9 of these were shape-based features, while 28 were texture-based features. Notably the CLAHE group (ICC = 0.890 ± 0.554, p < 0.05) had the highest reproducibility compared to the features extracted from the AHE group (ICC = 0.850 ± 0.933, p < 0.05) and manual delineation (ICC = 0.673 ± 0.807, p > 0.05). Therefore, the segmentation techniques used in the second phase are based on CLAHE and ACM method. The Principal Component Analysis (PCA) Random Forest (RF) classification has proved to be the most reliable pipelines with the lowest complexity in this research with 92% of accuracy, 83% of precision, 100% of sensitivity, 94% of ROC.


Download File

[img] Text
SITI FAIRUZ BINTI MAT RADZI - IR.pdf

Download (876kB)

Additional Metadata

Item Type: Thesis (Masters)
Subject: Radiography, Medical
Subject: BRCA genes
Call Number: FS 2022 4
Chairman Supervisor: Dr. Muhammad Khalis Bin Abdul Karim, PhD
Divisions: Faculty of Science
Depositing User: Ms. Rohana Alias
Date Deposited: 26 Jul 2023 02:12
Last Modified: 26 Jul 2023 02:12
URI: http://psasir.upm.edu.my/id/eprint/104309
Statistic Details: View Download Statistic

Actions (login required)

View Item View Item