High-dimensional Data Clustering with Fuzzy C-Means: Problem, Reason, and Solution

Yinghua Shen, Hanyu E, Tianhua Chen, Zhi Xiao, Bingsheng Liu, Yuan Chen

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Fuzzy C-Means (FCM) clustering algorithm is a popular unsupervised learning approach that has been extensively utilized in various domains. However, in this study, we point out a major problem faced by FCM when it is applied to the high-dimensional data, i.e., quite often the obtained prototypes (cluster centers) could not be distinguished with each other. Many studies have claimed that the concentration of the distance (CoD) could be a major reason for this phenomenon. This paper has therefore revisited this factor, and highlight that the CoD could not only lead to decreased performance, but sometimes also positively contribute to enhanced performance of the clustering algorithm. Instead, this paper point out the significance of features that are noisy and correlated, which could have a negative effect on FCM performance. Hence, to tackle the mentioned problem, we resort to a neural network model, i.e., the autoencoder, to reduce the dimensionality of the feature space while extracting features that are most informative. We conduct several experiments to show the validity of the proposed strategy for the FCM algorithm.
Original languageEnglish
Title of host publication16th International Work-Conference on Artificial Neural Networks
Publication statusAccepted/In press - 11 Jun 2021
Event16th International Work-Conference on Artificial Neural Networks - Virtual
Duration: 16 Jun 202118 Jun 2021
Conference number: 16

Conference

Conference16th International Work-Conference on Artificial Neural Networks
Abbreviated titleIWANN 2021
CityVirtual
Period16/06/2118/06/21

Fingerprint

Dive into the research topics of 'High-dimensional Data Clustering with Fuzzy C-Means: Problem, Reason, and Solution'. Together they form a unique fingerprint.

Cite this