Egorkin Anton Aleksandrovich (PhD student, Russian State Social University (RSSU))
|
The paper is devoted to the application of the k-means clustering method for power law distributed data. On the example of an array of data on financial transactions, clustering was carried out using the k-means method, the number of clusters was determined by optimizing the silhouette coefficient.
The article shows that when logarithms of the source data are used as input data for the k-means algorithm, the clustering quality improves, clusters become homogeneous, and the intra-class variance decreases. It is proved that in the one-dimensional case, when using logarithmic data, clustering is carried out around the geometric mean values. At the same time, the clustering results do not depend on the base of the logarithm, according to which the logarithm of the source data is performed. It was also demonstrated the need for other quality metrics, clustering, not based on the Euclidean distance or the distance of city blocks, when working with data distributed according to a power law.
Keywords:clustering, k-means algorithm, power law of distribution, silhouette coefficient
|
|
|
Read the full article …
|
Citation link: Egorkin A. A. FEATURES OF USING THE K-MEANS CLASSIFICATION ALGORITHM FOR DATA SUBJECT TO THE POWER LAW OF DISTRIBUTION // Современная наука: актуальные проблемы теории и практики. Серия: Естественные и Технические Науки. -2023. -№09. -С. 65-69 DOI 10.37882/2223-2966.2023.09.07 |
|
|