Feature Selection Methods for Linked Data: Limitations, Capabilities and Potentials

Marianne Cherrington, David Airehrour, Joan Lu, Qiang Xu, Stephen Wade, Samaneh Madanian

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Feature selection is an important pre-processing, data mining, and knowledge discovery tool for data analysis. By eliminating redundant and irrelevant features from high-dimensional data, feature selection diminishes the 'curse of dimensionality' to improve performance. Data are becoming increasingly complex; heterogeneous data may often be viewed as natural collections of linked objects. Linked data are structured data that are connected with other data sources through the use of semantic queries. It is increasingly prevalent in social media websites and biological networks. Many feature selection methods assume independent and identically distributed data (IID), a condition violated with linked data. In this paper, a review of current feature selection techniques for linked data is presented. Several approaches are examined in various contexts so that performance issues and ongoing challenges can be assessed. The major contribution of this paper is to underscore contemporary uses and limitations of linked data feature selection techniques with the purpose of informing existing capabilities and current potentials for key areas of future research and application.

Original languageEnglish
Title of host publicationBDCAT 2019
Subtitle of host publicationProceedings of the 6th IEEE/ACM International Conference on Big Data Computing, Applications and Technologies
Place of PublicationNew York
PublisherAssociation for Computing Machinery, Inc
Pages103-112
Number of pages10
ISBN (Print)9781450370165
DOIs
Publication statusPublished - 2 Dec 2019
Event6th IEEE/ACM International Conference on Big Data Computing, Applications and Technologies - Auckland, New Zealand
Duration: 2 Dec 20195 Dec 2019
Conference number: 6

Conference

Conference6th IEEE/ACM International Conference on Big Data Computing, Applications and Technologies
Abbreviated titleBDCAT 2019
CountryNew Zealand
CityAuckland
Period2/12/195/12/19

Fingerprint

Feature extraction
Data mining
Websites
Semantics
Feature selection
Linked data
Processing
social media
performance
website
data analysis
semantics
knowledge

Cite this

Cherrington, M., Airehrour, D., Lu, J., Xu, Q., Wade, S., & Madanian, S. (2019). Feature Selection Methods for Linked Data: Limitations, Capabilities and Potentials. In BDCAT 2019: Proceedings of the 6th IEEE/ACM International Conference on Big Data Computing, Applications and Technologies (pp. 103-112). New York: Association for Computing Machinery, Inc. https://doi.org/10.1145/3365109.3368792
Cherrington, Marianne ; Airehrour, David ; Lu, Joan ; Xu, Qiang ; Wade, Stephen ; Madanian, Samaneh. / Feature Selection Methods for Linked Data : Limitations, Capabilities and Potentials. BDCAT 2019: Proceedings of the 6th IEEE/ACM International Conference on Big Data Computing, Applications and Technologies. New York : Association for Computing Machinery, Inc, 2019. pp. 103-112
@inproceedings{b640be181a604e22957b20069397009d,
title = "Feature Selection Methods for Linked Data: Limitations, Capabilities and Potentials",
abstract = "Feature selection is an important pre-processing, data mining, and knowledge discovery tool for data analysis. By eliminating redundant and irrelevant features from high-dimensional data, feature selection diminishes the 'curse of dimensionality' to improve performance. Data are becoming increasingly complex; heterogeneous data may often be viewed as natural collections of linked objects. Linked data are structured data that are connected with other data sources through the use of semantic queries. It is increasingly prevalent in social media websites and biological networks. Many feature selection methods assume independent and identically distributed data (IID), a condition violated with linked data. In this paper, a review of current feature selection techniques for linked data is presented. Several approaches are examined in various contexts so that performance issues and ongoing challenges can be assessed. The major contribution of this paper is to underscore contemporary uses and limitations of linked data feature selection techniques with the purpose of informing existing capabilities and current potentials for key areas of future research and application.",
keywords = "Dimensionality Reduction, Feature Selection (FS), Heterogeneous Data, High-Dimensional Data (HDD), Linked Data (LD)",
author = "Marianne Cherrington and David Airehrour and Joan Lu and Qiang Xu and Stephen Wade and Samaneh Madanian",
year = "2019",
month = "12",
day = "2",
doi = "10.1145/3365109.3368792",
language = "English",
isbn = "9781450370165",
pages = "103--112",
booktitle = "BDCAT 2019",
publisher = "Association for Computing Machinery, Inc",

}

Cherrington, M, Airehrour, D, Lu, J, Xu, Q, Wade, S & Madanian, S 2019, Feature Selection Methods for Linked Data: Limitations, Capabilities and Potentials. in BDCAT 2019: Proceedings of the 6th IEEE/ACM International Conference on Big Data Computing, Applications and Technologies. Association for Computing Machinery, Inc, New York, pp. 103-112, 6th IEEE/ACM International Conference on Big Data Computing, Applications and Technologies, Auckland, New Zealand, 2/12/19. https://doi.org/10.1145/3365109.3368792

Feature Selection Methods for Linked Data : Limitations, Capabilities and Potentials. / Cherrington, Marianne; Airehrour, David; Lu, Joan; Xu, Qiang; Wade, Stephen; Madanian, Samaneh.

BDCAT 2019: Proceedings of the 6th IEEE/ACM International Conference on Big Data Computing, Applications and Technologies. New York : Association for Computing Machinery, Inc, 2019. p. 103-112.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

TY - GEN

T1 - Feature Selection Methods for Linked Data

T2 - Limitations, Capabilities and Potentials

AU - Cherrington, Marianne

AU - Airehrour, David

AU - Lu, Joan

AU - Xu, Qiang

AU - Wade, Stephen

AU - Madanian, Samaneh

PY - 2019/12/2

Y1 - 2019/12/2

N2 - Feature selection is an important pre-processing, data mining, and knowledge discovery tool for data analysis. By eliminating redundant and irrelevant features from high-dimensional data, feature selection diminishes the 'curse of dimensionality' to improve performance. Data are becoming increasingly complex; heterogeneous data may often be viewed as natural collections of linked objects. Linked data are structured data that are connected with other data sources through the use of semantic queries. It is increasingly prevalent in social media websites and biological networks. Many feature selection methods assume independent and identically distributed data (IID), a condition violated with linked data. In this paper, a review of current feature selection techniques for linked data is presented. Several approaches are examined in various contexts so that performance issues and ongoing challenges can be assessed. The major contribution of this paper is to underscore contemporary uses and limitations of linked data feature selection techniques with the purpose of informing existing capabilities and current potentials for key areas of future research and application.

AB - Feature selection is an important pre-processing, data mining, and knowledge discovery tool for data analysis. By eliminating redundant and irrelevant features from high-dimensional data, feature selection diminishes the 'curse of dimensionality' to improve performance. Data are becoming increasingly complex; heterogeneous data may often be viewed as natural collections of linked objects. Linked data are structured data that are connected with other data sources through the use of semantic queries. It is increasingly prevalent in social media websites and biological networks. Many feature selection methods assume independent and identically distributed data (IID), a condition violated with linked data. In this paper, a review of current feature selection techniques for linked data is presented. Several approaches are examined in various contexts so that performance issues and ongoing challenges can be assessed. The major contribution of this paper is to underscore contemporary uses and limitations of linked data feature selection techniques with the purpose of informing existing capabilities and current potentials for key areas of future research and application.

KW - Dimensionality Reduction

KW - Feature Selection (FS)

KW - Heterogeneous Data

KW - High-Dimensional Data (HDD)

KW - Linked Data (LD)

UR - http://www.scopus.com/inward/record.url?scp=85077340844&partnerID=8YFLogxK

U2 - 10.1145/3365109.3368792

DO - 10.1145/3365109.3368792

M3 - Conference contribution

AN - SCOPUS:85077340844

SN - 9781450370165

SP - 103

EP - 112

BT - BDCAT 2019

PB - Association for Computing Machinery, Inc

CY - New York

ER -

Cherrington M, Airehrour D, Lu J, Xu Q, Wade S, Madanian S. Feature Selection Methods for Linked Data: Limitations, Capabilities and Potentials. In BDCAT 2019: Proceedings of the 6th IEEE/ACM International Conference on Big Data Computing, Applications and Technologies. New York: Association for Computing Machinery, Inc. 2019. p. 103-112 https://doi.org/10.1145/3365109.3368792