Comparison of Pre-Defined Automatic Machine Learning (AutoML) for MBTI Personality Prediction of Twitter Users using Binary Classification Approach
DOI:
https://doi.org/10.37934/araset.62.1.106118Keywords:
Automatic machine learning, Personality prediction, MBTI, Binary classificationAbstract
The Myers-Briggs Type Indicator (MBTI) is a personality test that is globally accepted and used as a method for identifying personality. MBTI uses a four-factor linear model to characterize a person's behaviour patterns. This feature is often used to pursue career opportunities, make decisions, manage leadership, and deal with stress. In particular, MBTI personality prediction has been widely conducted and well performed using Recurrent Neural Network (RNN) based on Twitter data because it indirectly reveals most of a person's personality through their tweets. However, deep understanding is needed in building RNN-based solutions. Hence it will take a lot of time and resources to produce an excellent model architecture and the parameters used. Therefore, this study proposed the Auto Machine Learning (AutoML) method with a pre-defined search space to determine the correct model architecture and hyperparameters based on the results of data analysis. Thus, the search algorithm can exploit environments with suitable configurations in general. There are two pre-defined search spaces employed in this study, i.e. (i) two RNN algorithms, including Long Short-Term Memory (LSTM) and Gated Recurrent Unit (GRU), and (ii) pre-trained Word2Ves as word embedding. In addition, this study compares the model's performance that employs preprocessing and raw data (without preprocessing). The first result shows that the preprocessing increases the F1-Score values for LSTM and GRU by 2.35% and 2.02%, respectively. Subsequently, the LSTM outperformed GRU by the values of F1-Score at 0.35% and accuracy at 0.76%. The implementation of LSTM with pre-processed data in pre-defined AutoML with Word2Vec as a word embedding technique can provide good performance on long and complex data sequences such as Twitter data for predicting its user personality.