Sentiment Polarities Detection From Large Heterogeneous Textual Datasets Using Natural Language Processing
DOI:
https://doi.org/10.37934/araset.60.1.137149Keywords:
Textual datasets, big data, heterogeneous dataAbstract
Various activities based on text data performed every day and in return it is producing a huge volume of textual data in every second. As data grows very fast and handling such type of voluminous data creates many challenges for data analysts and stakeholders. Information retrieval from user actions, emotions, and sentiments helps in the prediction of future growth and decisions. Extraction of Information from large textual datasets containing sentiment polarities, expression, and concern is found critical. To solve these issues, many approaches have been implemented by practitioners such as processing of textual data, findings the trends based on surveys, and interviews from the potential users. Most common preprocessing method adopted for textual dataset is use of natural language processing (NLP). It comprises of multiple steps such as tokenization, lemmatization, stemming, parts of speech removal. The performance of information retrieval techniques plays an important role in big data analytics and must be utilized properly at the time of implementation. In this paper, we used two heterogeneous textual datasets to detect the polarities (positive and negative) from daily emotional dialogs, daily actions and unique words emotions of ACE2020 and Sarcasm datasets. Experiments were conducted on python notebook for preprocessing and polarities detection. Results show that both positive and negative emotions have a great effect on decision making. The performance of NLP on detection of polarities from larger the datasets is promising with better precision, recall, and accuracy score.