Comparison of Pre-Defined Automatic Machine Learning (AutoML) for MBTI Personality Prediction of Twitter Users using Binary Classification Approach

Selvi Fitria  Khoerunnisa; Farikhin; Bayu  Surarso; Ahmad Ainun  Herlambang; Retno Kusumaningrum

doi:10.37934/araset.62.1.106118

Authors

Selvi Fitria Khoerunnisa School of Postgraduate Studies, Universitas Diponegoro, Semarang Selatan, Semarang City, Central Java 50241, Indonesia
Farikhin Department of Mathematics, Faculty of Science and Mathematics, Universitas Diponegoro, Semarang Selatan, Semarang City, Central Java 50241, Indonesia
Bayu Surarso Department of Mathematics, Faculty of Science and Mathematics, Universitas Diponegoro, Semarang Selatan, Semarang City, Central Java 50241, Indonesia
Ahmad Ainun Herlambang Faculty of Engineering and Information Technology, University of Melbourne, Parkville VIC 3052, Australia
Retno Kusumaningrum Department of Informatics, Faculty of Science and Mathematics, Universitas Diponegoro, Semarang Selatan, Semarang City, Central Java 50241, Indonesia

DOI:

https://doi.org/10.37934/araset.62.1.106118

Keywords:

Automatic machine learning, Personality prediction, MBTI, Binary classification

Abstract

The Myers-Briggs Type Indicator (MBTI) is a personality test that is globally accepted and used as a method for identifying personality. MBTI uses a four-factor linear model to characterize a person's behaviour patterns. This feature is often used to pursue career opportunities, make decisions, manage leadership, and deal with stress. In particular, MBTI personality prediction has been widely conducted and well performed using Recurrent Neural Network (RNN) based on Twitter data because it indirectly reveals most of a person's personality through their tweets. However, deep understanding is needed in building RNN-based solutions. Hence it will take a lot of time and resources to produce an excellent model architecture and the parameters used. Therefore, this study proposed the Auto Machine Learning (AutoML) method with a pre-defined search space to determine the correct model architecture and hyperparameters based on the results of data analysis. Thus, the search algorithm can exploit environments with suitable configurations in general. There are two pre-defined search spaces employed in this study, i.e. (i) two RNN algorithms, including Long Short-Term Memory (LSTM) and Gated Recurrent Unit (GRU), and (ii) pre-trained Word2Ves as word embedding. In addition, this study compares the model's performance that employs preprocessing and raw data (without preprocessing). The first result shows that the preprocessing increases the F1-Score values for LSTM and GRU by 2.35% and 2.02%, respectively. Subsequently, the LSTM outperformed GRU by the values of F1-Score at 0.35% and accuracy at 0.76%. The implementation of LSTM with pre-processed data in pre-defined AutoML with Word2Vec as a word embedding technique can provide good performance on long and complex data sequences such as Twitter data for predicting its user personality.