High-dimensional Data Clustering with Fuzzy C-Means: Problem, Reason, and Solution

Yinghua Shen, Hanyu E, Tianhua Chen, Zhi Xiao, Bingsheng Liu, Yuan Chen

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Fuzzy C-Means (FCM) clustering algorithm is a popular unsupervised learning approach that has been extensively utilized in various domains. However, in this study, we point out a major problem faced by FCM when it is applied to the high-dimensional data, i.e., quite often the obtained prototypes (cluster centers) could not be distinguished with each other. Many studies have claimed that the concentration of the distance (CoD) could be a major reason for this phenomenon. This paper has therefore revisited this factor, and highlight that the CoD could not only lead to decreased performance, but sometimes also positively contribute to enhanced performance of the clustering algorithm. Instead, this paper point out the significance of features that are noisy and correlated, which could have a negative effect on FCM performance. Hence, to tackle the mentioned problem, we resort to a neural network model, i.e., the autoencoder, to reduce the dimensionality of the feature space while extracting features that are most informative. We conduct several experiments to show the validity of the proposed strategy for the FCM algorithm.
Original languageEnglish
Title of host publicationAdvances in Computational Intelligence
Subtitle of host publication16th International Work-Conference on Artificial Neural Networks, IWANN 2021, Virtual Event, June 16–18, 2021, Proceedings, Part I
EditorsIgnacio Rojas, Gonzalo Joya, Andreu Català
Place of PublicationCham
PublisherSpringer Nature Switzerland AG
Pages89-100
Number of pages12
VolumeLNCS/LNTCS 12861
Edition1st
ISBN (Electronic)9783030850302
ISBN (Print)9783030850296
DOIs
Publication statusPublished - 1 Sep 2021
Event16th International Work-Conference on Artificial Neural Networks - Virtual
Duration: 16 Jun 202118 Jun 2021
Conference number: 16
http://iwann.uma.es/

Publication series

NameLecture Notes in Computer Science
PublisherSpringer Nature Switzerland AG
VolumeLNCS/LNTCS 12861
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference16th International Work-Conference on Artificial Neural Networks
Abbreviated titleIWANN 2021
CityVirtual
Period16/06/2118/06/21
Internet address

Fingerprint

Dive into the research topics of 'High-dimensional Data Clustering with Fuzzy C-Means: Problem, Reason, and Solution'. Together they form a unique fingerprint.

Cite this