Australians are renowned for our sarcastic and witty humour. In fact, many people around the world consider Australians to be some of the most sarcastic people on the planet. We like to express ourselves by making fun of our friends, even ourselves, in the name of a good laugh. It is generally done in jest and is a form of friendly endearment rather than an intended insult. Many tourists or international students can find it confusing or sometimes offensive when first exposed to Aussie humour.
In this blog post, we'll explore a few techniques on detecting sarcasm in Aussie tweets. Let’s see if we can teach sarcasm detection to a computer if it can leave even humans scratching their heads.
Australians on Politics
We sourced over 110k Australian election related tweets from the 2019 elections and fed it through a sentiment analysis algorithm. At first glance, our results showed us that Aussies generally felt quite positive towards the election.
Let’s take a look at some of these “positive” tweets:
I love how Mr Bill Shorten used “sophistication” many times #auspol
Nah, it’s cool, we don’t need a planet anyway #auspol #ElectionResults2019
Good on ya, Queensland. #ImBeingVerySarcasticRightNow #Election2019Results #AUSVote19 #auspol
It seems our basic sentiment algorithm has not been able to detect sarcasm; what a surprise.
Machine learning & neural network techniques
A computer reads sentences a bit differently than a human - a computer breaks the sentence down into a series of numbers and decides sentiment based on those numbers. The problem is that the numbers associated with the phrase “it’s cool”, and “Good on ya, Queensland” are associated with positivity. But the surrounding context is sarcastic. In the case of politics, the use of sarcasm is generally reserved for a more negative tone.
We used a number of techniques to detect sarcasm in these tweets, including:
Logistic Regression and Linear SVM: These are traditional machine learning models that can be applied to various types of data, including text. In NLP, they are often used for tasks like sentiment analysis, classification, and, in our case, sarcasm detection.
Logistic regression - which gave us an interpretable and somewhat robust performance (we’ll explain further)
Linear SVM (support vector machine) - a more sophisticated machine learning method which ended up yielding similar results to logistic regression
Neural Network (LSTM): LSTMs (Long Short-Term Memory) are a type of recurrent neural network (RNN) architecture, and they are commonly used in NLP tasks. They are especially effective in capturing sequential dependencies in data, making them suitable for tasks where the order of words matters, such as sentiment analysis or sarcasm detection.
Although a popular method for text analysis, we found this technique to be prone to initial overfitting
For the purposes of this blog, we decided to continue with logistic regression because it is the simplest to interpret and break down how it works.
At a high level, this model identified keywords or phrases which were most prevalent in sarcastic text. These “phrases” were then given a “weighting” or score; the more of these “phrases” present in the tweet, the more likely it was to be tagged as sarcastic.
Here is an example of the top 10 “phrases” or features our model identified
Lets take a look at the example below. Our machine learning model picked up the word “just”, which is generally present in sarcasm. This word increases in importance if it is used before a positive word such as love or lovely, in a negative context, i.e. “I just love..” or "isn't it just lovely.."
Also, just lovely to wake up to hear ScoMo won the election. So much for ending climate change and improving the lives of everyone under 40. Cool cool cool fine fine fine. #auspol
This is a basic example of how machine learning begins to learn sarcasm. However, our model is not perfect and has not successfully detected sarcasm in all tweets. Consider the below tweet:
Amaze Balls, The Power Of Bill Short-On Details Is Mind Blowing! #Not #AusPol #AusVotes2019
A person with more context and understanding around Australian politics could probably have picked this up. However, for a machine it can be tricky. When spoken, sarcasm can be identified by the tone and facial expression of the person - however in written form, it can be more ambiguous.
How about this, one?
Our model has picked this up as sarcastic. We can't say for certain, what do you think?
We used our sarcasm model to adjust the initial sentiment analysis.
There is definitely sarcasm detected in the Twitter data. Our model has reclassified 13% of the positive tweets to neutral or negative (note: negative is reserved for tweets that are more explicitly negative).
Our natural next question: where do the most sarcastic Aussies live?
Sarcastic folks: where the bloody hell are you?
We’ve used our model and assigned tweets to Australian SA4 regions.
Looks like South Australia really stands out from the rest, along with parts of Brisbane, Sydney and Perth. We looked at capital cities below to pull out some stats.
This plot shows proportion of sarcastic tweets compared to total tweets in each city, with the dotted line representing the national average.
On a national average, about 1 in 4 tweets are sarcastic.
That's a pretty high statistic. Although, it is important note that Twitter is not perfectly reflective of the election sentiment in reality. For this case study, we'll conclude that there is ample evidence in our data to support that Aussies are pretty sarcastic.
In terms of cities specifically, Darwin, you guys are definitely taking the lead here with 1 in 3 being sarcastic! Canberra… can't say I’m too surprised there.
Final Thoughts
Natural language processing is a very complex task. Despite the advancements in the field, the detection of sarcasm remains a challenging task, and there are certain limitations to the current methods used for detecting sarcasm in text.
What we have highlighted in our analysis is that the context (i.e. Australian politics in this case) and the medium in which the text is written plays a large part in training text-based models. Sarcasm is often used in a particular context and can be influenced by various factors, such as the speaker's tone of voice, facial expression, and the surrounding circumstances. However, in text-based communication, such as tweets, emails, or chat messages, these cues are missing, making it difficult for NLP algorithms to accurately detect sarcasm.
Many companies have started to explore the gold mine of free form writing to better understand and target their customers. Some applications we've seen text sentiment analysis used is in large financial institutions to track sentiment towards news articles. Our analysis shows that it is still very important to use other methods, such as human judgment, to verify the results of NLP algorithms.
About EdgeRed
EdgeRed is an Australian boutique consultancy specialising in data and analytics. They draw value and insights through data science and artificial intelligence to help companies make faster and smarter decisions.
Subscribe to our newsletter to receive the our latest data analysis and reports directly to your inbox.
Comments