Big Data: Issues and Challenges in Clustering Data Visualization

Ummu Hani’ Hair Zaki; Izyan Izzati Kamsani; Ahmad Firdaus Ahmad Fadzil; Zainura Idrus; Eser  Kandogan

doi:10.37934/araset.51.1.150159

Authors

Ummu Hani’ Hair Zaki Faculty of Computing, Universiti Teknologi Malaysia, 81310 Johor Bahru, Johor, Malaysia
Izyan Izzati Kamsani Faculty of Computing, Universiti Teknologi Malaysia, 81310 Johor Bahru, Johor, Malaysia
Ahmad Firdaus Ahmad Fadzil College of Computing, Informatics and Media, Universiti Teknologi Mara Cawangan Melaka (Kampus Jasin), 77300 Merlimau, Melaka, Malaysia
Zainura Idrus Faculty of Computer and Mathematical Science, Universiti Teknologi Mara, 40450 Shah Alam, Selangor, Malaysia
Eser Kandogan Megagon Labs, 444 Castro St #900, Mountain View, CA 94041, United States

DOI:

https://doi.org/10.37934/araset.51.1.150159

Keywords:

Big data, Clustering visualization, Geometric projection, Star coordinate

Abstract

In the era of big data, the continuous generation of data from various fields has resulted in large and complex datasets. These datasets often come in diverse formats and structures, including unstructured or semi-structured data. Despite the wide availability of big data, high dimensionality remains a significant challenge for analysing and understanding the data for various purposes. Clustering analysis plays a crucial role in data analysis and visualization by uncovering hidden patterns and structures within datasets. However, several challenges hinder the effectiveness of clustering analysis, including data dimensionality, selection of appropriate clustering algorithms, determining the optimal number of clusters, interpreting the results, and handling outliers. This paper aims to explore these challenges and presents preferable visualization techniques that aid in visualizing and interpreting clustering results. By addressing these challenges, including the difficulty of handling outliers and the struggles with high-dimensional datasets, and employing effective visualization techniques, researchers and practitioners can enhance their understanding and utilization of clustering analysis in data analysis.