Feature Selection: Filter Methods Performance Challenges

Marianne Cherrington, Fadi Thabtah, Joan Lu, Qiang Xu

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Learning is the heart of intelligence. The focus in machine learning is to automate methods that achieve objectives, improve predictions or encourage informed behavior. Feature selection is a vital step in data analysis that often reduces dataset dimensionality by eliminating irrelevant and/or redundant attributes to simplify the learning process or improve outcomes' quality. This research critically analyses different filter methods based on ranking procedures (Information Gain (IG), Chi-square (CHI), V-score, Fisher Score, mRMR, Va and ReliefF) and identifies possible challenges that arise. We particularly concentrate on how threshold determination can affect results of different filter methods based on ranked scores. We show that this issue is vital, especially in the era of big data in which users deal with attributes in the magnitudes of tens of thousands with only a limited number of instances.

LanguageEnglish
Title of host publication2019 International Conference on Computer and Information Sciences, ICCIS 2019
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages1-4
Number of pages4
ISBN (Electronic)9781538681251
ISBN (Print)9781538681268
DOIs
Publication statusPublished - 16 May 2019
Event2019 International Conference on Computer and Information Sciences, ICCIS 2019 - Sakaka, Saudi Arabia
Duration: 3 Apr 20194 Apr 2019

Conference

Conference2019 International Conference on Computer and Information Sciences, ICCIS 2019
CountrySaudi Arabia
CitySakaka
Period3/04/194/04/19

Fingerprint

Learning systems
Feature extraction
Big data
Feature selection
Filter
Prediction
Machine learning
Dimensionality
Learning process
Ranking

Cite this

Cherrington, M., Thabtah, F., Lu, J., & Xu, Q. (2019). Feature Selection: Filter Methods Performance Challenges. In 2019 International Conference on Computer and Information Sciences, ICCIS 2019 (pp. 1-4). [8716478] Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/ICCISci.2019.8716478
Cherrington, Marianne ; Thabtah, Fadi ; Lu, Joan ; Xu, Qiang. / Feature Selection : Filter Methods Performance Challenges. 2019 International Conference on Computer and Information Sciences, ICCIS 2019. Institute of Electrical and Electronics Engineers Inc., 2019. pp. 1-4
@inproceedings{b9818d2b380e422bb35d17bfac271ac5,
title = "Feature Selection: Filter Methods Performance Challenges",
abstract = "Learning is the heart of intelligence. The focus in machine learning is to automate methods that achieve objectives, improve predictions or encourage informed behavior. Feature selection is a vital step in data analysis that often reduces dataset dimensionality by eliminating irrelevant and/or redundant attributes to simplify the learning process or improve outcomes' quality. This research critically analyses different filter methods based on ranking procedures (Information Gain (IG), Chi-square (CHI), V-score, Fisher Score, mRMR, Va and ReliefF) and identifies possible challenges that arise. We particularly concentrate on how threshold determination can affect results of different filter methods based on ranked scores. We show that this issue is vital, especially in the era of big data in which users deal with attributes in the magnitudes of tens of thousands with only a limited number of instances.",
keywords = "Data mining, Feature ranking, Feature selection, Filter methods, Machine learning",
author = "Marianne Cherrington and Fadi Thabtah and Joan Lu and Qiang Xu",
year = "2019",
month = "5",
day = "16",
doi = "10.1109/ICCISci.2019.8716478",
language = "English",
isbn = "9781538681268",
pages = "1--4",
booktitle = "2019 International Conference on Computer and Information Sciences, ICCIS 2019",
publisher = "Institute of Electrical and Electronics Engineers Inc.",

}

Cherrington, M, Thabtah, F, Lu, J & Xu, Q 2019, Feature Selection: Filter Methods Performance Challenges. in 2019 International Conference on Computer and Information Sciences, ICCIS 2019., 8716478, Institute of Electrical and Electronics Engineers Inc., pp. 1-4, 2019 International Conference on Computer and Information Sciences, ICCIS 2019, Sakaka, Saudi Arabia, 3/04/19. https://doi.org/10.1109/ICCISci.2019.8716478

Feature Selection : Filter Methods Performance Challenges. / Cherrington, Marianne; Thabtah, Fadi; Lu, Joan; Xu, Qiang.

2019 International Conference on Computer and Information Sciences, ICCIS 2019. Institute of Electrical and Electronics Engineers Inc., 2019. p. 1-4 8716478.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

TY - GEN

T1 - Feature Selection

T2 - Filter Methods Performance Challenges

AU - Cherrington, Marianne

AU - Thabtah, Fadi

AU - Lu, Joan

AU - Xu, Qiang

PY - 2019/5/16

Y1 - 2019/5/16

N2 - Learning is the heart of intelligence. The focus in machine learning is to automate methods that achieve objectives, improve predictions or encourage informed behavior. Feature selection is a vital step in data analysis that often reduces dataset dimensionality by eliminating irrelevant and/or redundant attributes to simplify the learning process or improve outcomes' quality. This research critically analyses different filter methods based on ranking procedures (Information Gain (IG), Chi-square (CHI), V-score, Fisher Score, mRMR, Va and ReliefF) and identifies possible challenges that arise. We particularly concentrate on how threshold determination can affect results of different filter methods based on ranked scores. We show that this issue is vital, especially in the era of big data in which users deal with attributes in the magnitudes of tens of thousands with only a limited number of instances.

AB - Learning is the heart of intelligence. The focus in machine learning is to automate methods that achieve objectives, improve predictions or encourage informed behavior. Feature selection is a vital step in data analysis that often reduces dataset dimensionality by eliminating irrelevant and/or redundant attributes to simplify the learning process or improve outcomes' quality. This research critically analyses different filter methods based on ranking procedures (Information Gain (IG), Chi-square (CHI), V-score, Fisher Score, mRMR, Va and ReliefF) and identifies possible challenges that arise. We particularly concentrate on how threshold determination can affect results of different filter methods based on ranked scores. We show that this issue is vital, especially in the era of big data in which users deal with attributes in the magnitudes of tens of thousands with only a limited number of instances.

KW - Data mining

KW - Feature ranking

KW - Feature selection

KW - Filter methods

KW - Machine learning

UR - http://www.scopus.com/inward/record.url?scp=85067038436&partnerID=8YFLogxK

U2 - 10.1109/ICCISci.2019.8716478

DO - 10.1109/ICCISci.2019.8716478

M3 - Conference contribution

SN - 9781538681268

SP - 1

EP - 4

BT - 2019 International Conference on Computer and Information Sciences, ICCIS 2019

PB - Institute of Electrical and Electronics Engineers Inc.

ER -

Cherrington M, Thabtah F, Lu J, Xu Q. Feature Selection: Filter Methods Performance Challenges. In 2019 International Conference on Computer and Information Sciences, ICCIS 2019. Institute of Electrical and Electronics Engineers Inc. 2019. p. 1-4. 8716478 https://doi.org/10.1109/ICCISci.2019.8716478