Classification of Breast Cancer Subtypes using Microarray RNA Expression Data
DOI:
https://doi.org/10.37934/araset.46.1.7585Keywords:
Breast cancer classification, Feature selections, Machine learningAbstract
Breast cancer is a heterogeneous disease that involves molecular alteration, cellular alterations, and clinical outcome for which the classification of Breast cancer remains a challenge to diagnose. Current practice uses immunohistochemistry markers and clinical variables to classify Breast cancer, but this approach has limitations due to the inclusion of other tumour subtypes and healthy individuals. Machine learning approaches based on mRNA expression data offer new possibilities for researchers to investigate the potential of molecular biomarkers as one of the diagnostic characteristics. The purpose of this study is to evaluate features (genes) rank through feature selection method for Breast cancer diagnostic test. Three feature selection methods of IG, relief and mRMR were applied and subsets of top 100, 50, 25, 10, 5 and 3 were created. Each subset was tested with SVM, LR and RF classifiers and its performance was assessed using confusion matrix. The result of this study found that the feature selection of IG, reliefF and mRMR was able to achieve highest accuracy with SVM, LR and RF classifier. mRMR with RF classifier achieved highest accuracy with the least number of top rank genes with 25 genes. Hybrid feature selection approached (mRMR + SVM) improved accuracy of top 3 highest rank genes using SVM, LR and RF classifier. Future work should aim to use other feature selection methods and classifiers to explore the classification accuracy with the least features subset in multiclass cancer dataset.