Comparing the Effectiveness and Efficiency of Machine Learning Models for Spam Detection on Twitter

Authors

  • Stephanie Chua Faculty of Computer Science and Information Technology, Universiti Malaysia Sarawak, 94300 Kota Samarahan, Sarawak, Malaysia
  • Amy Tan Faculty of Computer Science and Information Technology, Universiti Malaysia Sarawak, 94300 Kota Samarahan, Sarawak, Malaysia
  • Puteri Nor Ellyza Nohuddin Higher Colleges of Technology, Sharjah Women’s College, 79799 Abu Dhabi, United Arab Emirates
  • Mohd Hanafi Ahmad Hijazi Faculty Of Computing and Informatics, Universiti Malaysia Sabah, 88400 Kota Kinabalu, Sabah, Malaysia

DOI:

https://doi.org/10.37934/araset.61.2.127138

Keywords:

Twitter, spam, text mining, machine learning

Abstract

A comprehensive study focused on the efficiency and effectiveness of machine learning models for Twitter spam detection was presented in this research. Spam detection on social media platforms is not only vital for user experience but also poses computational challenges due to the vast and dynamic nature of Twitter data. This investigation encompassed a range of machine learning models, including Naive Bayes (NB), Support Vector Machine (SVM), Logistic Regression (LR), k-Nearest Neighbours (KNN), and Decision Trees (DT). Their performances were scrutinized across two critical dimensions: classification accuracy and computational efficiency, as measured by the time taken for model execution. The results of the analysis revealed valuable insights into model performance. The NB and LR models emerged as the most computationally efficient models, with execution times ranging from 1.016 to 1.949 seconds. These models offered an attractive balance between speed and accuracy, making them suitable for real-time or resource-constrained applications. SVM, LR, KNN and DT were effective in classification with a performance of 98%. However, SVM models demanded longer execution times, ranging from 7.670 to 37.657 seconds. KNN and DT stroked a balance between accuracy and efficiency, with execution times ranging from 2.852 to 10.941 seconds and 1.080 to 2.442 seconds, respectively. Our research underscores the importance of considering both model effectiveness and computational efficiency when selecting a Twitter spam detection model. By offering a comparative assessment of these models, this study equipped researchers with valuable insights for making informed decisions in Twitter spam detection. It highlighted the trade-offs between model performance and efficiency, paving the way for more effective and resource-conscious approaches to combating spam on social media platforms.

Downloads

Download data is not yet available.

Author Biographies

Stephanie Chua, Faculty of Computer Science and Information Technology, Universiti Malaysia Sarawak, 94300 Kota Samarahan, Sarawak, Malaysia

chlstephanie@unimas.my

Amy Tan, Faculty of Computer Science and Information Technology, Universiti Malaysia Sarawak, 94300 Kota Samarahan, Sarawak, Malaysia

65428@siswa.unimas.my

Puteri Nor Ellyza Nohuddin, Higher Colleges of Technology, Sharjah Women’s College, 79799 Abu Dhabi, United Arab Emirates

pnohuddin@hct.ac.ae 

Mohd Hanafi Ahmad Hijazi, Faculty Of Computing and Informatics, Universiti Malaysia Sabah, 88400 Kota Kinabalu, Sabah, Malaysia

hanafi@ums.edu.my

Downloads

Published

2024-10-08

Issue

Section

Articles