Exploring K-Means Clustering Efficiency: Accuracy and Computational Time across Multiple Datasets
DOI:
https://doi.org/10.37934/araset.65.1.113Keywords:
Accuracy, efficiency, k-mean clustering, algorithm and datasetAbstract
In the realm of unsupervised machine learning, clustering stands as a pivotal method in data analysis. However, it grapples with challenges arising from diverse datasets, leading to certain algorithms displaying reduced effectiveness or prolonged execution times on specific data types. The performance of each clustering algorithms depends on both the dataset's sample size and its specific characteristics. Among these algorithms, K-means clustering stands out as a popular choice. It is essential to evaluate its accuracy levels and execution times across various datasets with different sample sizes and features. This paper assesses the precision and efficiency of the K-means clustering algorithm on three distinct datasets, namely seed data, iris data and well log data sourced from GitHub, each characterized by variations in both size and features. The Seed dataset represents three different varieties of wheat seeds, Iris dataset represents measurements of three different iris flowers species and Well log dataset represents Sonic log and Gamma ray data respectively. The aim is to analyse how accurate and efficient K-means algorithm performs across these data sets. The results show that K-means algorithm produces high accuracy and lower computational time to the Well log dataset.