01

Jul

The Twitter Equation

An article I read recently posited that the number of Twitter followers one has is driven primarily by the amount of time a user has been a member. This made me curious, so I set out to verify it myself.

The first step was to pull down data for a large number of Twitter users. I did this using Twitter’s API: search and users/show. Specifically, I searched for recent tweets with the word “the” in them, with the intention of pulling out a relatively unbiased sample of English-speaking users. As of this moment, I have data for 5,526 unique users that include number of followers, number of people followed, days on Twitter, and number of Tweets sent.

Second, the data needed to be cleaned. My random sample included, for example, LaVar Burton of Reading Rainbow fame (1,757,213 followers), as well as an account that Tweets out a quote from the Bible every 2 minutes (686,334 tweets). Here are the raw distributions of follower and tweet count:

After removing the top 1% in both categories, we have distributions that are a lot less extreme:

The next step was to load the data into R and run a quick multivariate linear regression. This gives us the so-called “Twitter equation”:

# of followers = (# of days on Twitter * 0.0672) + (# of people followed * 0.9849) + (# of tweets * 0.0107)

The model has an adjusted R-squared value of 0.5605; this value represents “the proportion of variability in a data set that is accounted for by a statistical model”. In other words, the model explains just over half of the variation from the mean of 566 followers. The other half of the variability is explained by other factors, such as being interesting.

So what does the Twitter equation tell us? Simply, that the number of tweets sent barely matters, longevity of account matters somewhat, and the number of users followed matters a lot. As an example, being on the site for two years is projected to account for about 50 followers. Sending 2 tweets per day over that same period would net about 15 followers. Lastly, following ~600 people — the average in my dataset — works out to a whopping 590 followers.

  1. dfkoz posted this