Jul 9, 2010

Tweeting the World Cup

A few weekends ago I went down to the Haight to watch the USA vs England World Cup match. I’ve watched other World Cup matches at Mad Dog in the Fog but I seriously underestimated the crowd that would ride the wave of USA World Cup mania and try to cram into one of the more popular viewing-pubs. Fifteen minutes before kickoff a long line had already formed, complete with metal barriers and an anxious crowd. Instead of waiting impatiently for the odd fan to leave Mad Dog and let the standby-line inside one-by-one, I went two blocks west to Danny Coyles. It was also full, but only to the edge, so I managed to grab a spot just outside the open window with a clear view of one of the screens. As I snuggled into the optimal viewing position, I wondered how many people had tweeted from inside Mad Dog, perhaps noting how crowded it was inside, or even if others had tweeted about the long lines well before the match started. Naturally, this can be searched, but I also wondered where people might be tweeting as the game started. Where people heading over to other pubs, just like I went to Danny Coyles? It would be really sweet if I could pull up a map around my current location and find tweets relevant to my interest (in this case, the USA vs England match), and read them on a location-by-location basis. Surely, this has been considered before, right? Hmmm…

If you’re interested, the site is here: http://www.pabloestrada.us/worldcup/

I wondered how difficult it might be to plot out local tweets on a Google map and see their geographic distribution, so I checked out the Google Maps API on a recommendation from a friend that the API is well documented, easy to understand and straightforward to implement. Indeed, throwing a map onto a web page is almost trivial and all that’s needed are some very basic parameters including latitude and longitude coordinates that define the map’s center. Easy enough, I thought. The next step was slurping some tweets from the Twitter API.

Pulling data from the Twitter search API is pretty straightforward and has been implemented elsewhere. Choosing to use jQuery not only simplified data retrieval, but also put the broad set of jQuery functions and plugins within reach. jQuery’s widespread use and well-docmuented API shortens the time to implementation for me. Naturally, I depended on Google search. However, many implementations use the JSON format instead of JSONP. I understand this is done for security reasons, and this necessitates the use of a server-side proxy. I chose JSONP simply because it doesn’t need the proxy.

Twitter’s advanced search page allowed me to construct the query and cross-checked with what I’d written based on their API documentation for the Search API. The documentation covers all the optional parameters and provides simple examples of their use, however, I would only later realize at least two parameters are missing range (allowed values): the since and until parameters; they allow the search to be restricted to a date range. I wished to restrict the search over given match dates to only return results that were tweeted during (or near) the game. The finest resolution for this restriction is entire days, so it’s not possible to restrict the search to within certain hours of a date.

At first I tried to use $.ajax to make the query as it provides a bit more flexibility than the $.getJSON function. At this point I was using mostly Google Chrome for testing and kept getting null as the returned value. In tracing this down, I found there was an unexpected character in line 1 of the HTML file, causing an Uncaught SyntaxError (Unexpected token) as a result. That seemed odd, because I certainly wasn’t changing the first (or first few) lines of the HTML file. After some searching I found this may be related to an old defect in Chrome. It seems a BOM character is inserted at the front of the JSON response, according to this: http://code.google.com/p/chromium/issues/detail?id=176. To avoid it I instead used $.getJSON. It has a bit less flexibility than $.ajax, but I worked around that by simply defining the callback function in $.getJSON to call another function, which I constructed separately. This also kept the code a bit more modular and easier to read. Once I had a proper query constructed and had a proper grip on the returned data, I saved the JSON response to a test.js file so I could keep developing without having to poll the API each and every time I, say, changed the color of some text on the page. It also froze the search results so I could test cases of tweets with different location results.

As I looked over the search results, I realized a very low percentage of tweets are tagged with geo-location data. Those that are make it very easy to jump from tweet to Google map, since the geo-tags include latitude and longitude – perfect for placing on a map! Unfortunately, most tweets don’t have this data. The Twitter search API makes it easy to restrict a search to a geographic area; my search was restricted to a 12 mile radius around San Francisco. I quickly realized the variety of location information among the tweets. Some users populate their profile with a location description: this could be San Francisco, or SF, or San Francisco, CA. Others populate a location that includes latitude and longitude, but is prefixed by a string such as “iPhone: ” or “ÜT: “. This really reduces the value of placing the tweets on a map of local (city) scale. When comparing tweets between different cities, this is perfectly valid and useful, and is probably sufficient resolution to make a geographic comparison. For example, one could plot out tweets from San Francisco and Los Angeles on one map. In that case, it may not be necessary to know exactly where in San Francisco a tweet originated. But in my case, I wanted to distinguish tweets sent from Mad Dog vs Danny Coyles, and this level of resolution just isn’t enough.

I decided not to plot tweets with unspecific location information on the map. This would not be intuitive because a user naturally associates a pinpoint on a map with that specific geographic location. Where should tweets tagged with the location, “SF,” be placed on the map? I considered placing them all in the middle, or very close to the middle, of the city (this is somewhere in or near The Castro). But then how could they be differentiated from a tweet that really did originate in The Castro and was tagged with latitude and longitude coordinates? I decided to list the tweets below the map but only plot tweets with coordinates onto the map. I used some simple regular expressions to match a few variations in location, for example, the iPhone or ÜT prefixes to coordinates.

The next step was to automatically trigger a search based on a World Cup match. The user shouldn’t have to type in “#GER” to search for Germany on the date of the Germany vs Spain semifinal. I placed all 64 matches on the right side of the page with the score for each team. Using small flag icons next to each team name, I had javascript extract the team name from the img src attribute of the flag to automatically trigger a search for the corresponding hashtag for that team. I also encoded each a href attribute with the match number so that the search could key off an array of 64 date values and restrict the search to the match date. This was pretty easy using regular expressions, but I later realized that using regex was completely overkill since the length and format of each img src string does not change for all the teams and all the matches. Only later when I tested in Internet Explorer did I realize this. Thanks, IE, for reminding me of the stings developers endured in years past when trying to write cross-browser compatible code for Netscape and IE 4, and who knows what other nightmares. Internet Explorer treats calls to String.split() with regular expressions differently than other browsers. But of course! I found a brief explanation on a page titled Inelegant JavaScript. Woohoo! In the end, it was pretty easy to slice() the string I needed and that took care of it. But hey, I learned some simple regular expressions and browser inconsistencies, right?

The next step was to take the process location information from the relevant tweets and place each one on the map. I used each team’s flag in place of the default Google map marker. If more than one team’s tweets were to be plotted on the map, this would produce a visual display of the distribution and relative magnitude between groups of tweeters mentioning each team. At some point this could even go further and plot some type of mood indicator for fans of each team, but that would require more processing and estimation of the tweet text and some guessing based on that. My map only plots one team at a time, but that’s not hard to change. In fact, I had some trouble not to do that, as I wanted to clear the map of all flags when a new search was triggered. The previous version (V2) of the Google API had a method to clear markers, but the current version requires each marker to be tracked so it can be individually erased when the clear marker function is called. Luckily, Stack Overflow came to the rescue with a solution for V3.

I made a function to take the JSON data and clicked-on team name. The function dynamically appended the contents of a `

` with the user profile image (avatar) of the Twitter user, the location information sent with the tweet, and the contents of the tweet. For tweets containing coordinate information, a function is triggered to plot the tweet on the Google map with the corresponding team flag. That function also creates an InfoWindow (tooltip) containing the user profile image and tweet contents, making each flag clickable, just like default Google markers. That adds a bit of personalization to each tweet on the map, and it’s cool to see the user’s profile next to their tweet and team flag.

After everything was working as expected, I used another function to build the search string URL based on the team name, match date, and some Twitter search parameters. Once everything was working I disconnected the test.js file and connected the code to the Twitter search API using this function to provide the input to $.getJSON.

There are still possible improvements, for example, I’ve done nothing to consider the speed of searches or anything to make the content update look a bit more smooth. There are also no graceful failures, for example, if a search yields no results or fails for another reason, or if a user clicks on a match in the future (because this triggers a search on a future date). Currently, nothing happens and no error is presented. Finally, this whole experiment will expire in its current form a few days or weeks after the end of the World Cup when the match dates become too old for the Twitter search API. Just like the tournament, it’s been fun while it lasted.