Author- Thota Ashavanthini Krishna
Protests are an important way for citizens to express their demands and issues with the government. As citizens grow more conscious of their rights, there has been an increase in the number of protests around the world. With the growth of technology, social media usage to exchange information and ideas has increased dramatically. Every day, millions of tweets are generated on a variety of topics. We have collected huge data regarding the tweets on Indian farmers and the tweets users’ data to understand the sentiment analysis better. We have conducted our sentiment analysis using Long Short Term Memory (LSTM) architecture.
In the country's northern regions, farmers in India are protesting against the three farm acts passed by Parliament in September 2020. The three acts are The Farmers’ Produce Trade and Commerce (Promotion and Facilitation) Act, The Farmers’ (Empowerment and Protection) Agreement of Price Assurance and Farm Services Act, and The Essential Commodities (Amendment) Act. The Farmers' (Empowerment and Protection) Agreement on Price Assurance and Farm Services Act of 2020 introduces contract farming, in which farmers produce crops in exchange for a mutually acceptable remuneration through contracts with corporate investors. Protesting farmers fear that big investors may bind them to disadvantageous contracts drafted by major corporate law firms, with liability clauses that are often beyond the comprehension of poor farmers. As a result of the mass protest, there has been a surge in the number of people sharing their thoughts and feelings on social media on a global scale. The public's ideas and attitudes were diverse. Thousands of individuals expressed their support for the farmers on social media, and the farmers received support from all around the world. Information moves digitally between users in today's fast-paced environment, and it can also impact how other users feel about an incident. As a result, it is essential to consider public opinion. Sentiment analysis is a natural language processing-based method for assessing and interpreting human emotions.
Sentiment analysis is a technique for assessing if a piece of text is good, negative, or neutral. Natural language processing (NLP) and machine learning techniques are used in a sentiment analysis system for text analysis to assign weighted sentiment scores to entities, topics, themes, and categories inside a sentence or phrase. Sentiment analysis aids data analysts at major corporations in estimating public sentiment, doing extensive market research, monitoring brand and product reputation, and gaining a better understanding of consumer experiences. Data analytics companies commonly integrate third-party sentiment analysis APIs into their customer experience management, social media monitoring, and workforce analytics systems to deliver valuable insights to their customers.
It is estimated that 80 percent of the world's data is unstructured and unpredictably organized. The majority of this unstructured data is made up of text, such as reviews, emails, conversations, social media, surveys, and publications. Investigating and analyzing these items can be time-consuming and difficult. The sentiment analysis system allows the organization to make sense of this vast amount of unstructured text by automating business operations and saving hours of human processing.
How does Sentiment Analysis work?
The process for basic sentiment analysis of text documents is simple:
Break down each text document into its constituent elements such as sentences, phrases, tokens, and parts of speech.
Identify each phrase and component that carries a sentiment.
Each phrase and component should be given a sentiment score from -1 to +1.
How is machine learning used for sentiment analysis?
The fundamental role of machine learning in sentiment analysis is to improve and simplify low-level text analytics processes like Part of Speech tagging. For instance, data scientists can train a machine learning model to recognize nouns by providing it with a large number of text documents that have been pre-tagged with examples. Using supervised and unsupervised machine learning techniques such as neural networks and deep learning, the model will be able to recognize nouns.
Machine learning can also help data analysts in tackling tough language evolution problems. The phrase "sick burn," for example, can have a variety of connotations. It's hard to make a sentiment analysis ruleset that takes into consideration every probable interpretation. Given a few thousand pre-tagged samples, a machine learning model can learn to distinguish between what "sick burn" means in the context of video games and what it means in the context of healthcare.
What is LSTM?
Hoch Reiter and Schmid Huber proposed LSTM in 1997, and it was refined and popularised by many individuals in subsequent work. They are currently frequently utilized and function exceptionally effectively in a wide range of situations. The long-term dependency problem is deliberately ignored by LSTMs. They don't have to work hard to remember things for a long time because it's nearly second nature to them. Long short-term memory (LSTM) is a deep learning architecture based on a synthetic recurrent neural network (RNN). LSTM has feedback connections, unlike normal feedforward neural networks. Because there may be lags of undetermined time between significant events in a particular statistic, LSTM networks are well-suited to classifying, analyzing, and making predictions based on statistical data. LSTMs were created to address the difficulties of exploding and vanishing gradients that might occur when training classically.
Sentiment Analysis Using LSTM
To begin, you must import the necessary libraries and data. You can utilize the data directly imported from Kaggle. Many freely available datasets for sentiment analysis of tweets and reviews are also available.
Preprocessing of Tweets
We'll now preprocess the tweets by removing extraneous material and converting them to lowercase. Following that, we'll provide the vocabulary size to be used and use a tokenizer to transform them into vectors. We've gathered all of the strings into one long strand. We are familiar with all of Python's predefined punctuation symbols. All of the punctuation in the sentences has to be removed. Individual reviews will now be separated and stored as separate list components. We discovered 835 duplicate entries, which resulted in the dataset's shape or the number of tweets being reduced to 17,165 unique tweets. We started by utilizing regular expressions to remove @-mentions, Retweets represented as ‘RT,' links, and hashtag symbols, as these things provide no value. It's worth noting that we didn't remove any words after the hashtag because they could contain a valuable connection to the tweet's meaning.
import seaborn as sns sns.countplot(x="verified",data=final);
As we can observe that just a small number of Twitter accounts are verified, we'll limit our analysis to only the verified accounts.
Tokenization is the process of breaking up a string into tokens. Commonly, these tokens are words, numbers, and/or punctuation. The tensorflow_text package provides a number of tokenizers available for preprocessing text required by your text-based models
pad_sequences is used to ensure that all sequences in a list have the same length. By default, this is done by padding 0 at the beginning of each sequence until each sequence has the same length as the longest sequence. This will truncate all sequences longer than the maximum length.
Defining the model
A sequential model is suitable for a basic stack of layers with one input tensor and one output tensor for each layer. We can create a Sequential model by passing a list of layers.
Here we have passed three different layers called Dense Layer, Embedding Layer, and LSTM Layer in the sequential model.
Dense implements the operation activation(dot(input, kernel) + bias), where weight is a weight matrix, bias is a bias vector, and activation is an element-wise activation function. Here we have used the softmax function. The softmax function is used as the activation function in the output layer of neural network models that predict a multinomial probability distribution.
It means softmax is used as the activation function for multi-class classification problems where class membership is required on more than two class labels.
To compress the input feature space into a smaller one, we apply an embedding layer. Consider a text classification problem with 40,000 unique words We have to preprocess the text and generate a term-document matrix. If we give this matrix to the model as input, it will have to calculate the weights of each individual feature (40,000 in total). This method requires a lot of memory.
Categorical_crossentropy is used as a loss function for a multi-class classification model where there are two or more output labels. The output label is assigned a one-hot category encoding value in form of 0s and 1. The output label, if present in integer form, is converted into categorical encoding using Keras.
Let us see the model accuracy -
score,acc = model.evaluate(X_test, Y_test, verbose = 2, batch_size = batch_size) print("score: %.2f" % (score)) print("acc: %.2f" % (acc))
It comes out to be -
51/51 - 1s - loss: 0.5080 - accuracy: 0.8398
Analyzing Sentiment of a Sample Tweet
In this way, we can run this analysis on many other tweets.
We may now share our thoughts, ideas, and opinions through digital media. Social networks have risen in popularity not only for this but also for spreading ideas and creating personal beliefs. Analyzing the specifics of social media sites can provide insight into the culture and the environment. The farmers' protest in India saw a huge increase in the number of tweets where individuals offered their ideas as a result of this. The farmers’ protest in India has created every category of people expressing their agitation towards the issue. In this paper, we have explored ways to understand the sentimentality of people by building a sentiment analysis model.
Sentiment analysis and classification of Indian farmers’ protest using Twitter data, Ashwin Sanjay Neogi, Kirti Anilkumar Garg, International Journal of Information Management Data Insights.
Data preprocessing in semi-supervised SVM classification, A. Astorino, E. Gorgone, Optimization: A Journal of Mathematical Programming and Operations Research.
Twitter Sentiment Analysis for Large-Scale Data: An Unsupervised Approach, Rafeeque Pandarachalil and Selvaraju Senthilkumar, Springer Science+Business Media New York 2014.
Sentiment analysis: A combined approach, Rudy Prabowo, and Mike Thelwall, Journal of Informetrics.
Sentiment Analysis of Twitter Data: A Hybrid Approach, Ankit Srivastava and Vijendra Singh, International Journal of Healthcare Information Systems and Informatics, Volume 14, Issue 2, April-June 2019.
Text-based Sentiment Analysis using LSTM, Dr. G. S. N. Murthy, Shanmukha Rao Allu, and Bhargavi Andhavarapu, International Journal of Engineering Research & Technology (IJERT), Vol. 9 Issue 05, May-2020.