Main Article Content

Clustering by partitioning around medoids using distance-based similarity measures on interval-scaled variables


D.L. Nkweteyim

Abstract

It is reported in this paper, the results of a study of the partitioning around medoids (PAM) clustering algorithm applied to four datasets, both standardized and not, and of varying sizes and numbers of clusters. The angular distance proximity measure in addition to the two more traditional proximity measures, namely the Euclidean distance and Manhattan distance, was used to compute object-object similarity. The data used in the study comprise three widely available datasets, and one that was constructed from publicly available climate data. Results replicate some of the well known facts about the PAM algorithm, namely that the quality of the clusters generated tend to be much better for small datasets, that the silhouette value is a good, even if not perfect, guide for the optimal number of clusters to generate, and that human intervention is required to interpret generated clusters. Additionally, results also indicate that the angular distance measure, which traditionally has not been widely used in clustering, outperforms both the Euclidean and Manhattan distance metrics in certain situations.

Keywords: PAM, Euclidean, Manhattan, Angular distance, Silhouette


Journal Identifiers


eISSN: 2437-2110
print ISSN: 0189-9546