At 10pm on Wednesday 12 Feb Kim Dot Com sent the following tweet:
A script fetched every tweet about @JohnKeyPM in the past 6 months. Analysis: 79% of the tweets were negative. #KeyCantWin2014
— Kim Dotcom (@KimDotcom) February 12, 2014
Now at face value that seems like a reasonable thing to claim. However, just like any claim of showing sentiment in a wide sample of the population, it is important to know how these conclusions were reached.
Three major things need to be considered:
– How the data was collected
– The way “negative” was defined
– The way “negative” was assessed
I will look at each of these separately.
How was the data collected: The important thing to ask is when they were collecting the tweets did they only collect the tweets that specifically contain the handle @JohnKeyPM, or did they also search for tweets that contained the phrase John Key. If I am tweeting about the PM in most cases I will not include his handle. Most of the time the PM, be they National or Labour, will be being talked about, not to. If you are wanting to get an accurate portrayal of the sentiment on twitter you need to include the both the handle and the phrase to get all mentions of the PM. Did they look at the raw number of tweets or did they only count the number of users who had tweeted negative things?
The way Negative was defined: As we all know, the English language can be notoriously difficult to use to convey tone in a written form. Sarcasm, factiousness, double meaning, puns, they are all very hard to pick up in text, especially after the fact. If the way negative was defined is too broad or too narrow it can skew the results.
The way Negative was assessed: Once they had decided on how they defined what was negative did they use software to automatically mark tweets as positive or negative? Or did they get people to individually check each tweet, and its place in the string of tweets, to decide if it was negative or not? If a tweet included the PMs handle and a negative word that was aimed at someone else in defence of the PM was that counted as being negative towards the PM?
There are many more questions that could be asked about the methodology of how Dot Com came to this statistic. When someone like Colmar Brunton and Roy Morgan release a new political poll they provide a detailed explanation of the methodology they used, how many people were sampled, they also only sample individual people once, where as on twitter it is possible that two people may be tweeting, one in support of the PM and one attacking the PM, the difference being the person supporting may tweet once or twice, the person attacking may tweet 10-20 times. How was this handled?
Dot Com uses social media very effectively to get his message out there without having the means, or the requirement, to explain his statements. We have all seen how misguided polls can be, Colin Craig claiming his candidate would win the electorate of Rodney, which tuned out to be significantly wrong.
When the surveys are undertaken with stringent controls on the methodology, social media can offer interesting insights into wider public sentiment. But when the methodology and assumptions behind the statistics are not made clear, one must be sceptical of their veracity. If Dot Com wishes to be taken seriously as a political player he needs to be able to back up his claims with evidence.