I plot the frequency of wikipedia searches of “Behavioral Economics”, and “Beer” – who knew the correlation would be 0.7!
Data on any wikipedia searches (back to 2007) are available at http://glimmer.rstudio.com/pssguy/wikiSearchRates/. The website allows you to download frequency hits per day as a csv, which is what I've done here.
# Behavioral Economics and Beer: # Author: Mark T Patterson Date: March 18, 2013 # Clear Workbench: rm(list = ls()) # libraries: library(lubridate) library(ggplot2)
## Find out what's changed in ggplot2 with ## news(Version == "0.9.1", package = "ggplot2")
# data: curr.wd = getwd() setwd("C:/Users/Mark/Desktop/Blog/Data") ts = read.csv("BehavEconBeer.csv", header = TRUE) setwd(curr.wd) # cleaning the dataset: str(ts) ts$date = as.character(ts$date) ts$date = mdy(ts$date)
## Using date format %m/%d/%Y.
ts = ts[, -1]
Note: the mdy function is in the lubridate package, which cleanly handles time/date data. I've eliminated the first column of data, which just gives row names inherited from excel.
p = ggplot(ts, aes(x = date, y = count)) + geom_line(aes(color = factor(name)), size = 2) p
It turns out the pattern we observe isn't at all unique – many variables follow (predictable) patterns of variation through the week. This doesn't necessarily mean, though, that the correlation between beer and behavioral economics is entirely spurious!