In a previous post, I described a recent study in which we found that including hashtags in a tweet enhances the retweetability of the tweet. In this post, I’ll focus on another factor that might affect retweetability: the URL.
As reported in my previous post, we collected a random sample of public tweets from Twitter’s Spritzer feed over a 7-week period, yielding about 74 million tweets. From these tweets, we identified 8.24 million of them as retweets. That is, 11.1% of the 74 million tweets are retweets.
Next, we searched for those tweets and retweets that contain at least one URL. We found that 21.1% of tweets and 28.4% of retweets include URLs, suggesting that a tweet with URLs is more likely to get retweeted.
We further investigated whether the retweetability of a tweet has anything to do with the type of website it refers to. Since most of the URLs included in tweets are shortened URLs, we first expanded the abbreviated URLs into their original URLs, and then extracted the domain names from the original URLs. [For example, given an abbreviated URL http://bit.ly/c1htE cited by a tweet, we first unshortened it to http://en.wikipedia.org/wiki/URL_shortening, and then extracted the domain name of en.wikipedia.org.] The URL domains are indicative of the type of content sources visited and shared by Twitter users.
Analyzing the 74 million tweets, we identified the 20 most popular URL domains referred to in our tweets and the number of tweets containing each URL domain:
||Number of Tweets|
On the other hand, the following table shows the 20 most popular URL domains cited in our 8.24 million retweets and the number of retweets containing each URL domain:
|Rank||URL Domain||Number of Retweets|
As can be seen, these two lists of URL domains do not match each other exactly. For example, formspring.me appears only in the first list, while mashable.com appears only in the second list. That is, the fact that a website is frequently cited in the tweets does not guarantee that it is also frequently referred to in the reweets, and vice versa.
For each URL domain, we computed a retweet rate by dividing the number of retweets containing the domain, by the number of tweets containing the domain. We then normalized the rate so that a value of 1.0 represents the average retweet rate of 11.1%. [For example, for twitpic.com, the retweet rate of 1.47 was calculated as (129,692/793,680)*(74/8.24).] A URL domain with a retweet rate higher than 1.0 indicates that, compared to the average case, the tweets containing this domain have a higher chance of getting retweeted.
The following table shows the retweet rates for the 10 most popular URL domains cited in our tweets:
|Rank||URL Domain||Retweet Rate|
As can be seen from the above table, the retweet rates vary greatly depending on the URL domains. For example, formspring.me, which is the 5th most popular domain, has a retweet rate of 0.05, suggesting that tweets containing that domain are very unlikely to be retweeted. On the other hand, the retweet rate of twitlonger.com is 6.07, suggesting that tweets containing that domain have high retweetability.
In the following plot, we show the retweet rates of the 50 most popular URL domains. The X-axis is the popularity rank of URL domains based on how many tweets contain each domain. The Y-axis represents the retweet rates of domains as computed above.
You can learn more from our paper about this work.
Editor: Sonal Chokshi