Exploring K-Means Clustering Efficiency: Accuracy and Computational Time across Multiple Datasets

Authors

  • Iliyas Karim Khan Fundamental and Applied Sciences Department, Universiti Teknologi PETRONAS, 32610 Seri Iskandar, Perak, Malaysia
  • Hanita Daud Fundamental and Applied Sciences Department, Universiti Teknologi PETRONAS, 32610 Seri Iskandar, Perak, Malaysia
  • Nooraini Zainuddin Fundamental and Applied Sciences Department, Universiti Teknologi PETRONAS, 32610 Seri Iskandar, Perak, Malaysia
  • Rajalingam Sokkalingam Fundamental and Applied Sciences Department, Universiti Teknologi PETRONAS, 32610 Seri Iskandar, Perak, Malaysia
  • Abdussamad Abdussamad Fundamental and Applied Sciences Department, Universiti Teknologi PETRONAS, 32610 Seri Iskandar, Perak, Malaysia
  • Abdus Samad Azad Fundamental and Applied Sciences Department, Universiti Teknologi PETRONAS, 32610 Seri Iskandar, Perak, Malaysia
  • Mudassar Iqbal Department of Mathematical Sciences Faculty of Basic Sciences, Balochistan University of Information Technology, Engineering and Management Sciences (BUITEMS), Quetta 87300, Pakistan
  • Mudasar Zafar School of Mathematics, Actuarial and Quantitative Studies (SOMAQS), Asia Pacific University of Technology & Innovation (APU), Bukit Jalil, 57000 Kuala Lumpur, Malaysia
  • Atta Ullah Fundamental and Applied Sciences Department, Universiti Teknologi PETRONAS, 32610 Seri Iskandar, Perak, Malaysia
  • Musarat Elahi Shaheed Benazir Bhutto Women University Peshawar, Khyber Pakhtunkhwa 00384, Pakistan
  • Ahmad Abubakar Suleiman Fundamental and Applied Sciences Department, Universiti Teknologi PETRONAS, 32610 Seri Iskandar, Perak, Malaysia

DOI:

https://doi.org/10.37934/araset.65.1.113

Keywords:

Accuracy, efficiency, k-mean clustering, algorithm and dataset

Abstract

In the realm of unsupervised machine learning, clustering stands as a pivotal method in data analysis. However, it grapples with challenges arising from diverse datasets, leading to certain algorithms displaying reduced effectiveness or prolonged execution times on specific data types. The performance of each clustering algorithms depends on both the dataset's sample size and its specific characteristics. Among these algorithms, K-means clustering stands out as a popular choice. It is essential to evaluate its accuracy levels and execution times across various datasets with different sample sizes and features. This paper assesses the precision and efficiency of the K-means clustering algorithm on three distinct datasets, namely seed data, iris data and well log data sourced from GitHub, each characterized by variations in both size and features. The Seed dataset represents three different varieties of wheat seeds, Iris dataset represents measurements of three different iris flowers species and Well log dataset represents Sonic log and Gamma ray data respectively. The aim is to analyse how accurate and efficient K-means algorithm performs across these data sets. The results show that K-means algorithm produces high accuracy and lower computational time to the Well log dataset.

Downloads

Download data is not yet available.

Author Biographies

Iliyas Karim Khan , Fundamental and Applied Sciences Department, Universiti Teknologi PETRONAS, 32610 Seri Iskandar, Perak, Malaysia

iliyas_22008363@utp.edu.my

Hanita Daud, Fundamental and Applied Sciences Department, Universiti Teknologi PETRONAS, 32610 Seri Iskandar, Perak, Malaysia

hanita_daud@utp.edu.my

Nooraini Zainuddin, Fundamental and Applied Sciences Department, Universiti Teknologi PETRONAS, 32610 Seri Iskandar, Perak, Malaysia

aini_zainuddin@utp.edu.my

Rajalingam Sokkalingam, Fundamental and Applied Sciences Department, Universiti Teknologi PETRONAS, 32610 Seri Iskandar, Perak, Malaysia

raja.sokkalingam@utp.edu.my

Abdussamad Abdussamad, Fundamental and Applied Sciences Department, Universiti Teknologi PETRONAS, 32610 Seri Iskandar, Perak, Malaysia

abdussamad_22009779@utp.edu.my

Abdus Samad Azad, Fundamental and Applied Sciences Department, Universiti Teknologi PETRONAS, 32610 Seri Iskandar, Perak, Malaysia

abdus_22009918@utp.edu.my

Mudassar Iqbal, Department of Mathematical Sciences Faculty of Basic Sciences, Balochistan University of Information Technology, Engineering and Management Sciences (BUITEMS), Quetta 87300, Pakistan

mudassar.iqbal@buitms.edu.pk

Mudasar Zafar, School of Mathematics, Actuarial and Quantitative Studies (SOMAQS), Asia Pacific University of Technology & Innovation (APU), Bukit Jalil, 57000 Kuala Lumpur, Malaysia

mudasar.zafar@apu.edu.my

Atta Ullah, Fundamental and Applied Sciences Department, Universiti Teknologi PETRONAS, 32610 Seri Iskandar, Perak, Malaysia

atta_22000639@utp.edu.my

Musarat Elahi, Shaheed Benazir Bhutto Women University Peshawar, Khyber Pakhtunkhwa 00384, Pakistan

baigmusarat8@gmail.com

Ahmad Abubakar Suleiman, Fundamental and Applied Sciences Department, Universiti Teknologi PETRONAS, 32610 Seri Iskandar, Perak, Malaysia

ahmad_22000579@utp.edu.my

Downloads

Published

2024-11-25

Issue

Section

Articles

Most read articles by the same author(s)