TY - JOUR
T1 - A novel data clustering algorithm based on gravity center methodology
AU - Kuwil, Farag Hamed
AU - Atila, Ümit
AU - Abu-Issa, Radwan
AU - Murtagh, Fionn
PY - 2020/10/15
Y1 - 2020/10/15
N2 - The concept of clustering is to separate clusters based on the similarity which is greater within cluster than among clusters. The similarity consists of two principles, namely, connectivity and cohesion. However, in partitional clustering, while some algorithms such as K-means and K-medians divides the dataset points according to the first principle (connectivity) based on centroid clusters without any regard to the second principle (cohesion), some others like K-medoids partially consider cohesion in addition to connectivity. This prevents to discover clusters with convex shape and results are affected negatively by outliers. In this paper a new Gravity Center Clustering (GCC) algorithm is proposed which depends on critical distance (λ) to define threshold among clusters. The algorithm falls under partition clustering and is based on gravity center which is a point within cluster that verifies both the connectivity and cohesion in determining the similarity of each point in the dataset. Therefore, the proposed algorithm deals with any shape of data better than K-means, K-medians and K-medoids. Furthermore, GCC algorithm does not need any parameters beforehand to perform clustering but can help user improving the control over clustering results and deal with overlapping and outliers providing two coefficients and an indicator. In this study, 22 experiments are conducted using different types of synthetic, and real healthcare datasets. The results show that the proposed algorithm satisfies the concept of clustering and provides great flexibility to get the optimal solution especially since clustering is considered as an optimization problem.
AB - The concept of clustering is to separate clusters based on the similarity which is greater within cluster than among clusters. The similarity consists of two principles, namely, connectivity and cohesion. However, in partitional clustering, while some algorithms such as K-means and K-medians divides the dataset points according to the first principle (connectivity) based on centroid clusters without any regard to the second principle (cohesion), some others like K-medoids partially consider cohesion in addition to connectivity. This prevents to discover clusters with convex shape and results are affected negatively by outliers. In this paper a new Gravity Center Clustering (GCC) algorithm is proposed which depends on critical distance (λ) to define threshold among clusters. The algorithm falls under partition clustering and is based on gravity center which is a point within cluster that verifies both the connectivity and cohesion in determining the similarity of each point in the dataset. Therefore, the proposed algorithm deals with any shape of data better than K-means, K-medians and K-medoids. Furthermore, GCC algorithm does not need any parameters beforehand to perform clustering but can help user improving the control over clustering results and deal with overlapping and outliers providing two coefficients and an indicator. In this study, 22 experiments are conducted using different types of synthetic, and real healthcare datasets. The results show that the proposed algorithm satisfies the concept of clustering and provides great flexibility to get the optimal solution especially since clustering is considered as an optimization problem.
KW - Algorithm
KW - Cluster analysis
KW - Euclidean distance
KW - Gravity center
KW - Partitional clustering
UR - http://www.scopus.com/inward/record.url?scp=85084338374&partnerID=8YFLogxK
U2 - 10.1016/j.eswa.2020.113435
DO - 10.1016/j.eswa.2020.113435
M3 - Article
AN - SCOPUS:85084338374
VL - 156
JO - Expert Systems with Applications
JF - Expert Systems with Applications
SN - 0957-4174
M1 - 113435
ER -