Clustering on Sentiment Analysis: Effect of Twitter Dataset

Authors

  • Sri Redjeki Indonesia Digital Technology University, Yogyakarta, Indonesia
  • Satria Abadi Faculty of Computing and Meta Technology, Universiti Pendidikan Sultan Idris, Perak, Malaysia
  • Deborah Kurniati Indonesia Digital Technology University, Yogyakarta, Indonesia
  • Sri Rezeki Candra Nursari Universitas Pancasila Jakarta, Indonesia
  • Ariesta Damayanti Indonesia Digital Technology University, Yogyakarta, Indonesia
  • Edi Iskandar Indonesia Digital Technology University, Yogyakarta, Indonesia

DOI:

https://doi.org/10.37934/araset.51.1.3951

Keywords:

Auto Labeling, Clustering, Deep Learning, LSTM, Sentiment Analysis

Abstract

The process of labeling text datasets presents a challenge in sentiment analysis, especially those done manually. This is because it takes time, effort, and skill which is taxing in Twitter data labeling. This study aims to auto-label Twitter dataset using a clustering approach to classify tourism twitter sentiment using one of the LSTM (Long Short Term Memory) deep learning algorithms. The clustering used for the auto labeling process is K-means, while the deep learning sentiment classification used is LSTM. The research datasets consist of 10,228 tweets about Yogyakarta tourism in Indonesia. The Twitter data language used in this study is Indonesian. The classification process using LSTM is carried out twice, the first process uses a manual label dataset, and the second process uses an auto-labeling dataset. The sentiment class is divided into 3, namely negative, positive and neutral. The results indicates that the classification of tourism twitter sentiment using the auto-labeling dataset provide better accuracy results than the manual-labeling dataset. LSTM classification model with auto-labeling dataset produces optimum graphs with an average accuracy of 99% while manual-labeling datasets produce overfitting charts with an average accuracy of 40%. The results showed that the auto-labeling process of the class dataset using K-Means clustering can improve the accuracy of the classification results of Yogyakarta tourism Twitter sentiment. The model produced in this study can help in solving class labeling problems in sentiment classification.

Downloads

Download data is not yet available.

Author Biographies

Sri Redjeki, Indonesia Digital Technology University, Yogyakarta, Indonesia

dzeky@utdi.ac.id

Satria Abadi, Faculty of Computing and Meta Technology, Universiti Pendidikan Sultan Idris, Perak, Malaysia

satriaabadi@meta.upsi.edu.my

Deborah Kurniati, Indonesia Digital Technology University, Yogyakarta, Indonesia

debbi@utdi.ac.id

Sri Rezeki Candra Nursari, Universitas Pancasila Jakarta, Indonesia

sri.rezeki.candr.n@univpancasila.ac.id

Ariesta Damayanti, Indonesia Digital Technology University, Yogyakarta, Indonesia

ariesta@utdi.ac.id

Edi Iskandar, Indonesia Digital Technology University, Yogyakarta, Indonesia

edi_iskandar@utdi.ac.id

Published

2024-09-04

Issue

Section

Articles