Merging K-means solutions for clustering large datasets
Speaker:
Semhar Michael, South Dakota State University
Date and Time:
Thursday, November 14, 2019 - 4:00pm to 4:30pm
Location:
Fields Institute, Stewart Library
Abstract:
The K-means algorithm is one of the most popular clustering procedures due to its computational speed and intuitive construction. Unfortunately, the application of K-means in its traditional form based on Euclidean distances is limited to cases with spherical clusters of approximately equal size. At the same time, it is a common practice to use the algorithm without checking the underlying assumption leading to meaningless or misleading solutions. We propose merging solutions obtained by K-means to produce meaningful groupings. The notion of pairwise overlap is used to measure the closeness of the groups in the obtained solution. The ideas are illustrated through examples and real data with good results.