Random Dimension Manipulation for Efficient High-Dimensional Data Clustering

Authors

  • Ummu Hani’ Hair Zaki Faculty of Computing, Universiti Teknologi Malaysia, 81310 Skudai, Johor, Malaysia
  • Izyan Izzati Kamsani Faculty of Computing, Universiti Teknologi Malaysia, 81310 Skudai, Johor, Malaysia
  • Roliana Ibrahim Faculty of Computing, Universiti Teknologi Malaysia, 81310 Skudai, Johor, Malaysia
  • Norzehan Sakamat Faculty of Information Technology and Quantitative Science, Universiti Teknologi Mara, Kampus Dengkil, 43800 Dengkil, Selangor
  • Eser Kandogan Megagon Labs, 444 Castro St #900, Mountain View, CA 94041, United States

DOI:

https://doi.org/10.37934/araset.51.1.129140

Keywords:

High-Dimensional Data, Clustering, Correlation, Dimensions Dependencies, Dimension Arrangement, Dimension Scaling, Random Dimension Manipulation, Star Coordinates, Visualization

Abstract

High-dimensional data is collected from various sources, fields, and applications such as medicine, science, business and more to provide helpful information to others. Unfortunately, the complexity of high-dimensional data has made it difficult to interpret and understand. As a result, sophisticated data analysis is required to extract knowledge and information from it. This can be illustrated through a visualization presentation. However, overlap between data can occur during visualization as data increases. Indirectly, it can cause a cluttered visual presentation. As a result, it affects the visual perception of high-dimensional data patterns. High-dimensional data can be deeply explored using dimension arrangement and scaling to overcome it. The arrangement of dimensions is essential since the relationship between these dimensions can influence the existence of an efficient cluster. This dimension is arranged based on the correlation value. The dimension that is more related will be placed next to each other. While performing clusters, dimensions will be scaled in or out. These features are available through Star Coordinate (SC) technique. This paper aims to conduct an exploratory data analysis in the SC environment where users can visualize and interact in a low-dimensional data visualization space. This paper demonstrates data dimensions manipulation's importance in structuring the projected space layout using two data sets. As a conclusion, formation of clusters was crucial and manipulation of data dimensions were essential to structure the projected space layout. The proposed approach has helped users find significant cluster formations by randomizing the scaling and order of dimensions.

Downloads

Download data is not yet available.

Author Biographies

Ummu Hani’ Hair Zaki, Faculty of Computing, Universiti Teknologi Malaysia, 81310 Skudai, Johor, Malaysia

hanizaki7@gmail.com

Izyan Izzati Kamsani, Faculty of Computing, Universiti Teknologi Malaysia, 81310 Skudai, Johor, Malaysia

izyanizzati@utm.my

Roliana Ibrahim, Faculty of Computing, Universiti Teknologi Malaysia, 81310 Skudai, Johor, Malaysia

roliana@utm.my

Norzehan Sakamat, Faculty of Information Technology and Quantitative Science, Universiti Teknologi Mara, Kampus Dengkil, 43800 Dengkil, Selangor

norzehan012@uitm.edu.my

Published

2024-09-04

How to Cite

Ummu Hani’ Hair Zaki, Izyan Izzati Kamsani, Roliana Ibrahim, Norzehan Sakamat, & Kandogan, E. . (2024). Random Dimension Manipulation for Efficient High-Dimensional Data Clustering. Journal of Advanced Research in Applied Sciences and Engineering Technology, 51(1), 129–140. https://doi.org/10.37934/araset.51.1.129140

Issue

Section

Articles

Most read articles by the same author(s)