On 28th August 2012, Datuk Seri Dr Rais Yatim (Minister of Information, Communications and Culture) announced that a world record of one million tweets was targeted for the Merdeka Day celebrations. To take part, Twitter users needed to send tweets from 8.15 PM – 9.15 PM (GMT +8) on 31st August 2012 using the hashtag #Merdeka55.
Politweet tracked mentions of the #Merdeka55 hashtag since the announcement. During the targeted hour, an odd pattern emerged during the live stream – large blocks of identical tweets were being sent at the same time.
Further investigation revealed that a small group of users were responsible for a large volume of tweets. These users had similar characteristics, e.g. account creation date, profile photos, location and follower/following relationships. All of their duplicate tweets were sent using Tweetdeck. We are going to call these users ‘Clones’ and expose their methods and impact on the stats in another blog post.
Stats for the hour follow.
Total for the hour
Tweets : 109,320
Users : 19,838
This graph shows TPM from 8.10 PM – 9.20 PM. Tweets rose almost vertically at 8.15 PM. The highest peaks were 2,146 TPM at 8.24 PM and 2,104 TPM at 8.37 PM. Tweets started to decline at 9.11 PM, then spiked one minute at 9.15 PM. After that tweet levels increased as news of the 2.5 million tweet record broke.
Location of #Merdeka55 tweets
Tweets were coming in from all across the country. Globally there were only 5 tweets outside Malaysia – Myanmar, Switzerland, London, Indonesia and Singapore. Its safe to say that the #Merdeka55 hashtag usage was almost entirely confined to Malaysia.
The most retweeted (RT) tweets of the hour are listed below, in order. We count the number of users who RT, not the number of times. This is to reduce the impact of spammers. RT counts shown only cover retweets made on August 31st, of tweets made between 8.15PM – 9.15PM.
1. WardinaSafiyyah, 412 RTs[tweet https://twitter.com/WardinaSafiyyah/status/241514491878199296]
2. KhairyKJ, 303 RTs[tweet https://twitter.com/Khairykj/status/241521048825262080]
3. WardinaSafiyyah, 262 RTs[tweet https://twitter.com/WardinaSafiyyah/status/241515475929034752]
4. KhairyKJ, 206 RTs[tweet https://twitter.com/Khairykj/status/241519309883588609]
5. KhairyKJ, 159 RTs[tweet https://twitter.com/Khairykj/status/241511446029148160]
This is one of the most popular tweets used by the clones. This message was sent by 178 users, scripted to go out at preset times that day.[tweet https://twitter.com/ibrahim_zamri/status/241513568791584771]
How many #Merdeka55 tweets were really sent?
A total figure of 3,611,323 tweets was announced at 9.43 PM that night. But immediately after 9.15 PM the figure announced was 2.5 million. The announcement of the record was not accompanied by any source. No company or online tracking service was named.
It is not clear which figure is correct in reference to the one hour duration, but 3.6 million is what the organiser announced as the record so we will use that for our calculations.
This graph from downrightnow.com was screen-captured at midnight on August 31st .We marked the graph with lines indicating each hour from 6 PM – 10 PM. Based on this, there was fluctuation and lower quality of service from 8 – 10 PM.
Twitter’s performance drops when their system is under heavy load, which is to be expected if 3.6 million tweets were sent out. Based on this graph at the time, the announced figure seemed believable.
From our experience with the #GOP2012 and #DNC2012 conventions so far, our approach seems to be getting about 16% – 28% of the real total. However that estimate is influenced by the global tweets-per-minute (TPM). If global TPM is high, then we get significantly more. If we only used Twitter Search, we would have got an estimated 8% of the real total.
There was some comments online saying that the population of Malaysia needs to be taken into consideration when comparing to USA. That does not really apply here, because we are not looking at how many people are talking about Merdeka Day. Instead we are looking at how many people are competing to set a record. There is the expectation that some users would tweet multiple times to contribute to the goal.
Estimating the real total
The convention totals announced by Twitter cover a period of hours, not one hour. The conventions’ tweets-per-hour were definitely lower than 3.6 million tweets. So if we make the assumption that we got 8% – 28% of the real total:
- Estimated total (min) = 109,320 / 28 * 100
- Estimated total (min) = 390,428
- Estimated total (max) = 109,320 / 8 * 100
- Estimated total (max) = 1,366,500
Based on our data, the estimated total #Merdeka55 tweets is 360,428 – 1,366,500 tweets.
Estimating the tweets- per-minute (TPM)
Twitter’s system gives us a per-minute sample of what is tweeted. By taking the highest peak in our data, we can estimate the TPM of the real data.
- Highest peak = 2,146 TPM
- Percentage of our total = 2,146/109,320 * 100 = 1.963 %
- Given total = 3,611,323
- Estimated peak = 3,611,323 * 1.963 %
- Estimated peak = 70,890 TPM
During the Olympics, Twitter mentioned the biggest records as:
- Usain Bolt winning the gold in the 200m sprint (80,000+ TPM)
- Usain Bolt winning the gold in the 100m sprint (74,000+ TPM)
- Andy Murray winning the gold in men’s tennis singles (57,000+ TPM)
That puts #Merdeka55 as being just below Usain Bolt. It is surprising that such a record went unnoticed by Twitter.
Was the #Merdeka55 a world record?
Only Twitter and their data provider partners (Gnip, Datasift) know the true number of tweets sent for any given topic. Other online systems only have access to a subset of tweets sent, using the same API that Politweet used.
Twitter tends to announce tweets-per-minute and tweets-per-second records, not tweets-per-hour. The closest record that seems relevant is the 2.7 million tweets about Spain during the #Euro2012 Final against Italy, which should cover about 2 hours or more (90 minute match + 15 minute halftime + post-match buzz).
So assuming the figure is true, it is possible that the 3.6 million tweets are a world record. However to date, Twitter has made no announcement on their blog about #Merdeka55. There is also no mention of the #Merdeka55 record online by other tracking websites. Without a third party to verify the data, the 3.6 million tweets figure is doubtful.
The presence of clones also reduces the quality of the record. If the person or organisation in charge of these clones hadn’t polluted the data, whatever record was achieved would have had more historical value.
Update #1 (7th September 2012)
Corrected a typo under ‘How many #Merdeka55 tweets were really sent’. Original text was “If global TPM is high, then we get significantly less“. Correct version is “If global TPM is high, then we get significantly more“. Twitter’s Streaming API offers access to a percentage of tweets based on how much is globally tweeted at the moment. It is stated to be 1%, but we found it to be more.
This does not mean we can only get 1% of tweets on any topic. Think of the limitation as a ceiling on how much data can be received per minute. For example, if our limit is 4000 TPM and the total tweets about @NajibRazak is 3000 TPM, we would then get 100% of all tweets. If we are tracking tweets about @NajibRazak (real total 3000 TPM) and tweets about @BarackObama (real total 3000 TPM), then we would lose 2000 TPM because our limit is 4000 TPM.