Sentiment Polarities Detection From Large Heterogeneous Textual Datasets Using Natural Language Processing

Ganesh Kumar Kumar; Shuib   Basri; Abdullahi Abubakar   Imam; Sunder Ali   Khuwaja; Abdullateef Oluwagbemiga   Balogun; Hussaini   Mamman

doi:10.37934/araset.59.2.8092

Authors

Ganesh Kumar Kumar School of Engineering and Technology, Sunway University, No. 5, Jalan Universiti, Bandar Sunway, 47500 Selangor Darul Ehsan, Malaysia.
Shuib Basri School of Engineering and Technology, Sunway University, No. 5, Jalan Universiti, Bandar Sunway, 47500 Selangor Darul Ehsan, Malaysia
Abdullahi Abubakar Imam School of Digital Sciences, Universiti Brunei Darussalam, BE1410, Brunei Darussalam
Sunder Ali Khuwaja Department of Telecommunication Engineering, Faculty of Engineering and Technology, University of Sindh, Jamshoro 76090, Pakistan
Abdullateef Oluwagbemiga Balogun School of Engineering and Technology, Sunway University, No. 5, Jalan Universiti, Bandar Sunway, 47500 Selangor Darul Ehsan, Malaysia
Hussaini Mamman School of Engineering and Technology, Sunway University, No. 5, Jalan Universiti, Bandar Sunway, 47500 Selangor Darul Ehsan, Malaysia

DOI:

https://doi.org/10.37934/araset.59.2.8092

Keywords:

Textual datasets, big data, heterogeneous data

Abstract

Various activities based on text data performed every day and in return it is producing a huge volume of textual data in every second. As data grows very fast and handling such type of voluminous data creates many challenges for data analysts and stakeholders. Information retrieval from user actions, emotions, and sentiments helps in the prediction of future growth and decisions. Extraction of Information from large textual datasets containing sentiment polarities, expression, and concern is found critical. To solve these issues, many approaches have been implemented by practitioners such as processing of textual data, findings the trends based on surveys, and interviews from the potential users. Most common preprocessing method adopted for textual dataset is use of natural language processing (NLP). It comprises of multiple steps such as tokenization, lemmatization, stemming, parts of speech removal. The performance of information retrieval techniques plays an important role in big data analytics and must be utilized properly at the time of implementation. In this paper, we used two heterogeneous textual datasets to detect the polarities (positive and negative) from daily emotional dialogs, daily actions and unique words emotions of ACE2020 and Sarcasm datasets. Experiments were conducted on python notebook for preprocessing and polarities detection. Results show that both positive and negative emotions have a great effect on decision making. The performance of NLP on detection of polarities from larger the datasets is promising with better precision, recall, and accuracy score.