Examining applying high performance genetic data feature selection and classification algorithms for colon cancer diagnosis

Murad Al-Rajab, Joan Lu, Qiang Xu

Research output: Contribution to journalArticle

13 Citations (Scopus)

Abstract

Background and Objectives:
This paper examines the accuracy and efficiency (time complexity) of high performance genetic data feature selection and classification algorithms for colon cancer diagnosis. The need for this research derives from the urgent and increasing need for accurate and efficient algorithms. Colon cancer is a leading cause of death worldwide, hence it is vitally important for the cancer tissues to be expertly identified and classified in a rapid and timely manner, to assure both a fast detection of the disease and to expedite the drug discovery process.

Methods:
In this research, a three-phase approach was proposed and implemented: Phases One and Two examined the feature selection algorithms and classification algorithms employed separately, and Phase Three examined the performance of the combination of these.

Results:
It was found from Phase One that the Particle Swarm Optimization (PSO) algorithm performed best with the colon dataset as a feature selection (29 genes selected) and from Phase Two that the Support Vector Machine (SVM) algorithm outperformed other classifications, with an accuracy of almost 86%. It was also found from Phase Three that the combined use of PSO and SVM surpassed other algorithms in accuracy and performance, and was faster in terms of time analysis (94%).

Conclusions:
It is concluded that applying feature selection algorithms prior to classification algorithms results in better accuracy than when the latter are applied alone. This conclusion is important and significant to industry and society.
LanguageEnglish
Pages11-24
Number of pages14
JournalComputer Methods and Programs in Biomedicine
Volume146
Early online date4 May 2017
DOIs
Publication statusPublished - Jul 2017

Fingerprint

Colonic Neoplasms
Feature extraction
Particle swarm optimization (PSO)
Support vector machines
Drug Discovery
Research
Cause of Death
Industry
Colon
Genes
Tissue

Cite this

@article{d72799f62f84463380c9d17a3f339650,
title = "Examining applying high performance genetic data feature selection and classification algorithms for colon cancer diagnosis",
abstract = "Background and Objectives:This paper examines the accuracy and efficiency (time complexity) of high performance genetic data feature selection and classification algorithms for colon cancer diagnosis. The need for this research derives from the urgent and increasing need for accurate and efficient algorithms. Colon cancer is a leading cause of death worldwide, hence it is vitally important for the cancer tissues to be expertly identified and classified in a rapid and timely manner, to assure both a fast detection of the disease and to expedite the drug discovery process.Methods:In this research, a three-phase approach was proposed and implemented: Phases One and Two examined the feature selection algorithms and classification algorithms employed separately, and Phase Three examined the performance of the combination of these.Results:It was found from Phase One that the Particle Swarm Optimization (PSO) algorithm performed best with the colon dataset as a feature selection (29 genes selected) and from Phase Two that the Support Vector Machine (SVM) algorithm outperformed other classifications, with an accuracy of almost 86{\%}. It was also found from Phase Three that the combined use of PSO and SVM surpassed other algorithms in accuracy and performance, and was faster in terms of time analysis (94{\%}).Conclusions:It is concluded that applying feature selection algorithms prior to classification algorithms results in better accuracy than when the latter are applied alone. This conclusion is important and significant to industry and society.",
keywords = "Colon cancer, Algorithm efficiency, Feature selection, Classification, Gene expression",
author = "Murad Al-Rajab and Joan Lu and Qiang Xu",
year = "2017",
month = "7",
doi = "10.1016/j.cmpb.2017.05.001",
language = "English",
volume = "146",
pages = "11--24",
journal = "Computer Methods and Programs in Biomedicine",
issn = "0169-2607",
publisher = "Elsevier",

}

TY - JOUR

T1 - Examining applying high performance genetic data feature selection and classification algorithms for colon cancer diagnosis

AU - Al-Rajab, Murad

AU - Lu, Joan

AU - Xu, Qiang

PY - 2017/7

Y1 - 2017/7

N2 - Background and Objectives:This paper examines the accuracy and efficiency (time complexity) of high performance genetic data feature selection and classification algorithms for colon cancer diagnosis. The need for this research derives from the urgent and increasing need for accurate and efficient algorithms. Colon cancer is a leading cause of death worldwide, hence it is vitally important for the cancer tissues to be expertly identified and classified in a rapid and timely manner, to assure both a fast detection of the disease and to expedite the drug discovery process.Methods:In this research, a three-phase approach was proposed and implemented: Phases One and Two examined the feature selection algorithms and classification algorithms employed separately, and Phase Three examined the performance of the combination of these.Results:It was found from Phase One that the Particle Swarm Optimization (PSO) algorithm performed best with the colon dataset as a feature selection (29 genes selected) and from Phase Two that the Support Vector Machine (SVM) algorithm outperformed other classifications, with an accuracy of almost 86%. It was also found from Phase Three that the combined use of PSO and SVM surpassed other algorithms in accuracy and performance, and was faster in terms of time analysis (94%).Conclusions:It is concluded that applying feature selection algorithms prior to classification algorithms results in better accuracy than when the latter are applied alone. This conclusion is important and significant to industry and society.

AB - Background and Objectives:This paper examines the accuracy and efficiency (time complexity) of high performance genetic data feature selection and classification algorithms for colon cancer diagnosis. The need for this research derives from the urgent and increasing need for accurate and efficient algorithms. Colon cancer is a leading cause of death worldwide, hence it is vitally important for the cancer tissues to be expertly identified and classified in a rapid and timely manner, to assure both a fast detection of the disease and to expedite the drug discovery process.Methods:In this research, a three-phase approach was proposed and implemented: Phases One and Two examined the feature selection algorithms and classification algorithms employed separately, and Phase Three examined the performance of the combination of these.Results:It was found from Phase One that the Particle Swarm Optimization (PSO) algorithm performed best with the colon dataset as a feature selection (29 genes selected) and from Phase Two that the Support Vector Machine (SVM) algorithm outperformed other classifications, with an accuracy of almost 86%. It was also found from Phase Three that the combined use of PSO and SVM surpassed other algorithms in accuracy and performance, and was faster in terms of time analysis (94%).Conclusions:It is concluded that applying feature selection algorithms prior to classification algorithms results in better accuracy than when the latter are applied alone. This conclusion is important and significant to industry and society.

KW - Colon cancer

KW - Algorithm efficiency

KW - Feature selection

KW - Classification

KW - Gene expression

U2 - 10.1016/j.cmpb.2017.05.001

DO - 10.1016/j.cmpb.2017.05.001

M3 - Article

VL - 146

SP - 11

EP - 24

JO - Computer Methods and Programs in Biomedicine

T2 - Computer Methods and Programs in Biomedicine

JF - Computer Methods and Programs in Biomedicine

SN - 0169-2607

ER -