Making AP Election Data Easy with Fusion Tables

This post is for journalists who use (or would like to use) election data from the Associated Press -- which is a paid service the AP provides. If that describes you, read on!

When Google gave away free, live election data for the Iowa Caucuses, something struck me right away: It was easy.

Data provided by the Associated Press, which drives almost every election site you've ever seen, is notoriously tricky to manage -- a statement I'm confident making based on talks with many election-night veterans and on my own experience.

But Google's results were posted in a public Google Fusion Table, which is basically a spreadsheet on steroids. That meant I could get the data I wanted simply by constructing the correct URL. Votes by county, sorted by county? No problem. Candidate totals for the entire state? Sure. Votes mashed with other data I had? Yup. Formatted in JSON? Bring it.

Instantly. Easily.

(Here are the URLs I used above, and here's the documentation from Google on how to construct them. Hard-to-find tip: Append &jsonCallback=anything to get the json. And if you're using jQuery AJAX calls, make it &jsonCallback=?)

A week later, for the New Hampshire Primary, there were no free Google data. So I made an AP data-fetcher-and-wrangler based on code by Al Shaw. Through no fault of Al's code, my adaptation was slow, complicated and crashed every couple of hours. It worked, but just barely.

Next up was South Carolina, and I was determined to make AP's data friendlier by putting in a Google Fusion Table.

And it worked.

How I did it

In the interest of time and clarity -- and to spark discussion before the primaries are over (or irrelevant) -- I'm leaving out a bunch of the nitpicky details. If you're an AP Elections subscriber and want to try this, contact me at john (at) johnkeefe.net. I'll help you any way I can.

AP provides data in several formats, including a "flat file," which basically is a huge, semicolon-delimited spreadsheet. Each row represents a county, and each column the latest stats for that county, such as precincts reporting and each candidate's total votes.

The flat file doesn't have column headers, though. So I first uploaded AP's South Carolina test table to Google Spreadsheets and added the column names I needed.

I then imported the spreadsheet into a non-public Google Fusion Table.

For election night, I set up a script on my computer that does the following steps every two minutes:

1. Logs into AP's servers via FTP and downloads the flat file.

2. Deletes the data from the Google Fusion Table I made earlier and uploads the entire flat file anew. This is accomplished with a little Python program written by the brilliant (and patient) Kathryn Hurley, of the Google Fusion Tables team. I've posted it here with her permission. I don't know Python, but didn't need to. I just needed to make sure the list of columns in the data_import.py exactly matched the columns in my table. So I cut-and-pasted them from the Google spreadsheet. The script executes the command:

python data_import.py [google account username] [flat file filepath] [fusion table id]

3. Next, it hits the Fusion Table with a simple URL request formatted to return the data I want as JSON. This is the URL I used for getting the county totals.

4. Then it sends that JSON as a file, via FTP, to a subdirectory of my map application on WNYC's servers.

Once a minute, the election map running in the user's browser looks at that data file to get the latest info.

In this way, I completely avoided the need to build and maintain a database. I know there are great database folks out there, but I'm not one of them. The Fusion Table became my database.

Technically, I could skip steps 3 and 4 by simply pointing my map application at the Fusion Table to get the data it needs. That's what I did for Iowa, using the free Google data. But the table would be publicly visisble on Google's servers ... and my reading of the AP contract, understandably, doesn't allow that.

I strongly believe that the easier AP's data is to use, the more budding journocoders will make new election-night interactives. And if we can work together to do that, let's. For me, this method was a lot easier than anything else I had tried before.

A final note: If you're a Python-savvy programmer, be sure to check out what the LA Times has shared to make life easier, too. It's pretty slick.