Continuing my investigation of Jeff Gentry's twitteR package, I decided to take a look at the distribution of twitter users' followers.

As a rough place to start, I examined the distribution of followers *for those who follow me* – that is, I first gather a dataframe with all my followers, then I look at the number of followers those users have. Fortunately, Jeff's package makes this really easy (see my code below).

Since I was expecting a distribution with a very long right-tail, I decided to plot the logarithm of the number of followers.

The result was an almost perfect normal distribution, which was surprising given my small sample-size (I have about 650 followers).

To give a sense of reference, I added the log-follower count for some famous folks (and plotted my own as well).

# note: you'll need to save the 'credentials' file, and load # it before you can access twitter data. # for help with this, see this post: #https://sites.google.com/site/dataminingatuoc/home/data-from-twitter/r-oauth-for-twitter load("C:/Users/Mark/Documents/twitteR_credentials") registerTwitterOAuth(Cred) me = getUser("M_T_Patterson", cainfo = "cacert.pem") #this works # What can I learn about a user? me$getFavorites(cainfo = "cacert.pem") fl = me$getFollowers(cainfo = "cacert.pem") df = data.frame(name = sapply(fl,function(x) x$screenName), id = sapply(fl,function(x) x$id), #last.tweet.date = sapply(fl,function(x) x$lastStatus$created), followers = sapply(fl,function(x) x$followersCount), location = sapply(fl,function(x) x$location)) # sorting by number of followers: df.f = df[order(df$followers,decreasing = TRUE),] head(df.f,50) #(Not run) #library(ggplot2) #p = ggplot(df.f, aes(x = log(followers))) + geom_density() #p + geom_text(aes(log(refs$followers), y = 0.3, label = refs$name, fill = "blue", size = 5)) # this is interesting -- it looks like a log-normal distribution. # adding some references: refs = data.frame( name = c("Graduate Student:\nMark Patterson", "Famous R Statistician:\nHadley Wickham", "Famous Journalist:\nThomas Friedman", "Famous Heartthrob:\nJustin Bieber"), followers = c(656,5446,234686,46602072)) # a bit more on the density of the distribution at various points: dens = density(log(df.f$followers)) refs$log.followers = log(refs$followers) # find.closest dens.value: dens.lookup = function(val){ dens$y[which.min(abs(val - dens$x))] } refs$dens = sapply(refs$log.followers, function(x){dens.lookup(x)}) library(ggplot2) p = ggplot(df.f, aes(x = log(followers))) + geom_density() p + geom_text(aes(log(refs$followers), y = refs$dens, label = refs$name, size = 5),color = "blue")+ theme(legend.position = "none") + scale_x_continuous(limits = c(0,20)) + labs(title = "Density of log(Followers) on twitter")

