Depression Detection on Mandarin Text through Bert Model

Wai Shiang Cheah; Teck Kiong Yung; Mahir  Pradana; Hamizan  Sharbiniung; Iwan Tri Riyadi  Yanto

doi:10.37934/araset.60.2.295311

Authors

Wai Shiang Cheah Faculty of Computer Science and Information Technology, Universiti Malaysia Sarawak, 93250 Kota Samarahan, Sarawak, Malaysia
Teck Kiong Yung Faculty of Computer Science and Information Technology, Universiti Malaysia Sarawak, 93250 Kota Samarahan, Sarawak, Malaysia
Mahir Pradana Department of Business Administration, Telkom University, Jalan Terusan Buah Batu 1, Bandung 40257, Indonesia
Hamizan Sharbiniung Faculty of Computer Science and Information Technology, Universiti Malaysia Sarawak, 93250 Kota Samarahan, Sarawak, Malaysia
Iwan Tri Riyadi Yanto Department of information system, Fakulty of technology and Applied Science, Universitas Ahmad Dahlan, Kota Yogyakarta, Daerah Istimewa Yogyakarta 55166, Indonesia

DOI:

https://doi.org/10.37934/araset.60.2.295311

Keywords:

NLP, Depression, Machine learning, Transformer, BERT

Abstract

Depression is currently one of the most prevalent mental disorders and its incidence has been rising significantly in Malaysia amid the Covid-19 pandemic. While previous studies have demonstrated the potential of artificial intelligence technology in analysing social media texts to detect signs of depression, most of these studies have focused on English textual content. Considering that Mandarin is the second most widely spoken language worldwide, it is worthwhile to explore depression detection techniques specifically tailored for Mandarin textual content. This research aims to examine the effectiveness of the BERT model in text classification, particularly for detecting depression in Mandarin. The study proposes the utilization of the BERT model to analyse social media posts related to depression. The model is trained using the WU3D dataset, which comprises a collection of over 2 million text data sourced from Sina Weibo, a prominent Chinese social media platform. Given the dataset's inherent imbalance, text augmentation techniques were employed to assess whether they contribute to improved model performance. The findings suggest that the BERT model trained on the original dataset outperformed the model trained on the augmented dataset. This implies that the BERT model is well-equipped to handle imbalanced datasets effectively. Furthermore, it is speculated that the augmented dataset did not introduce novel information or knowledge during the model training process. Notably, the highest-performing model achieved an impressive accuracy rate of 88% on the testing dataset.