If you happen to be one of the billions that uses social media, it’s very possible that one of your posts may have been used for sentiment analysis if you’ve talked about a particular company. Love Pepsi over Coca-Cola? Pepsi loves that, though Coca-Cola may have something to say about that…after they’ve taken note. It would be a positive sentiment for Pepsi, but possibly negative for Coca-Cola if they’re looking specifically for people who prefer their products. Had a bad experience at Chick-Fil-A? Chick-Fil-A has taken note of this negative comment. Your opinions of these brands and companies help them decide what to do next as it is important that the consumers are satisfied with what they’re producing. But exactly how are these companies able to extract what you’re tweeting or posting into some type of emotion?
Machine learning isn’t just a buzzword, it has become a major part of our everyday lives. From the personalized ads and product recommendations that we receive to brain tumor anomaly detection, machine learning has had a major impact in how we are able to operate and progress as a society. While there are many processes and sub-topics under the huge umbrella that is machine learning, the focus of this article will be on that of sentiment analysis. In short, sentiment analysis is the process of detecting the emotion from text.(4) Typically, businesses will use sentiment analysis to analyze the overall sentiment of their clients and customers. This could be used to gauge various metrics such as the reputation of a company or how consumers feel about a particular product.
Sentiment Analysis Process
Sentiment analysis is a text classification tool capable of determining whether the text being analyzed is neutral, positive, or negative. There are examples of sentiment analysis being further expanded to analyze text and go beyond emotion detection. An example of such includes intent, where an algorithm can detect the reason or intent behind a user’s message.(2) If a user has posted a question about an issue they’re having with a company’s service, a sentiment analysis algorithm may classify that as a complaint. Now if that same user had a question about how to use said service, it may be classified then as a query. These categories can filter out various posts so that they can go to their respective departments so that the users may receive further assistance. Urgency is also another metric that can be detected from text4.
Natural Language Processing (NLP) (discussed in more detail in “Text Translation and You”) refers to the development and various processes that allow a computer to dissect, analyze, and interpret language and text in the way that humans can. In a previous Insights Article, NLP was used for language translation whereas here we are using it for sentiment classification. Sentiment analysis is a type of text classifier. Text classifiers are a class of machine learning where they do just that—they classify text. One would have to define what the text classifier is analyzing and what categories would be used for the text to be classified into.
To conduct a simple, baseline sentiment analysis, one would go through the following process:
If you don’t already have the text or data ready, you can also use a sample dataset to test on. You are also free to use an API if you desire to pull in live data from a specific source, such as Twitter or Instagram.
The example text document that you’re using to conduct your analysis on should be first broken down into parts, which is also known as the process of parts of speech tagging (PoS). How you break it down will vary depending on your goal, but generally this will include sentences, parts of speech, phrases, and tokens.(3) It is important to do this as a word by itself may have a different sentiment than when it’s combined in a phrase or an entire sentence. Accurate PoS tagging will help you create more accurate classifications.
For each component, likely starting off with sentences and phrases, you will need to identify those with possible sentiment and assign a “sentiment score” to each one. This score is usually between -1 (negative) and +1 (positive), with 0 being neutral.
When all of this is put together and, on a plot, one can see how much of the entire text was classified into negative, neutral, and positive groupings.
Machine learning can take things a step further in different ways. Instead of going through and identifying the parts of speech by hand (noun by noun, verb by verb, etc.), a model can be trained to identify parts of speech and speed up that tagging process for you. There are libraries out there, such as the Natural Language Toolkit (or NLTK for short), that you can use that will help you with building and training a model. It’s still recommended that you check over the parts of speech tagging as there are some phrases or words that may be identified in a way you may not want it to be. For example, how would you classify the phrase “Cap!”? How might someone else classify it? The context of your text is important in identifying how that phrase is to be classified as well.
Case Study: Sample Tweets from Twitter
Twitter is one the best and easiest places to gather data. With a Twitter API, you can scrape as many tweets as you need to build or test out an algorithm. There are also other places, such as Kaggle, where you can find preexisting datasets to test and train. There is a project called Sentiment140, spearheaded by three (3) Stanford students, which “allows you to discover the sentiment of a brand, product, or topic on Twitter”.(1) Below is a short sample from the data they collected in which we will use to identify a few tweets.
There are a few columns to note first (from left to right):
Excel row number.
Polarity of the tweet (Similar to the above-mentioned “sentiment score”). 0 means “negative”, 2 means “neutral”, and 4 means “positive”.
Tweet ID (we won’t need this for our example)
Date tweet was posted.
Topic of the Tweet
Twitter User ID
Text of the tweet
It should be noted that there are emoticons included in the figure above and the owners of the project treated “ :) ” as positive and “ :( ” as negative1. Let’s take a random tweet, such as that in row 53: “On my way to see Star Trek @ The Esquire.”
This was rated as a neutral tweet, which makes sense as there’s no language to indicate otherwise. There are no words associated with positivity, like “happiness”, or negativity, like “sucks” or “unexcited”. It should be noted that an algorithm can also take symbols into account as well so that, in this case, it will know that “@” will refer to either an email address or a location.
Another example tweet will be that in row 60:
“omg. The commercials alone on ESPN are going to drive me nuts.”
This is, obvious to us, a negative tweet based on the context and phrase “drive me nuts”. This is an example of where it is important to have accurate PoS tagging as the computer would not know how to categorize this otherwise.
The last example to further show the difference between the positive, neutral, and negative sentiments would be this tweet from row 63:
“Hello Twitter API. ;)”
Normally by itself “Hello Twitter” would be neutral as it’s very much similar to the first example we looked at. Because this tweet includes “ ;) ”, the computer noted this as “positive” as this one emoticon is being considered in the tweet’s context. It is due to examples like these where we can see that the difference in one word, or emoticon in this case, that can change the overall context and sentiment of the text.
While falling under the umbrella of NLP, sentiment analysis is a machine learning technique that detects and analyzes text to output a sentiment that is most closely related to the overall context of the text. The output can be as simple as the outlook (positive, negative, or neutral) to something more complex as emotions (angry, happy, or sad). Sentiment analysis is another avenue that can help your business determine how your clients and consumers feel about your products or services. Many companies, such as Google and Twitter, use this technique in market research to gauge how they should update their services based on how users use and react to their platforms. Customer service chatbots that are seen online can be used to for a person to input how they’re feeling as a way for the chatbot to decide what response should be issued next and how urgent the request is. These are just a few examples of how this technology is currently being used and it’s improving every day. The next innovation in this technology could very well be a project completed by you.
1. Go, Alec, et al. “A Twitter Sentiment Analysis Tool.” Sentiment140, http://help.sentiment140.com/.
2. Gupta, Shashank. “Sentiment Analysis: Concept, Analysis and Applications.” Medium, Towards Data Science, 7 Jan. 2018, https://towardsdatascience.com/sentiment-analysis-concept-analysis-and-applications-6c94d6f58c17.
3. “Sentiment Analysis Explained.” Lexalytics, Lexalytics, https://www.lexalytics.com/technology/sentiment-analysis.
4. “Sentiment Analysis: A Definitive Guide.” MonkeyLearn, MonkeyLearn, https://monkeylearn.com/sentiment-analysis/.