Charting Local Olympic Data

The WNYC Data News team isn't just about maps. We dig into all kinds of structured data -- and the 2012 Olympics will generate a bunch of it.

There are some great efforts afoot to follow the Games, with the New York Times doing amazing work as always.

Our slice of that effort is the Team NYC Olympian Tracker, which we made to help our audience follow who's competing today, who's in contention for a medal and who's won medals. 

WNYC's Team NYC Olympian Tracker

The entire application fed by a Google Spreadsheet, which is linked to the chart by a cool data tookit called Miso, from The Guardian and Bocoup.  Since we're inthe process of interviewing for designers, we went with a clean and quick design using Bootstrap.

It was a fun project and, just like a map, built to make interesting data easy to use.

Love Design? Join the WNYC Data News Team

Do you want to ...

  Inform the citizens of New York?

  Help people understand their world?

  Root out corruption?

  Make a mark on society?

  Craft beautiful online projects and visualizations?

  ( Like this diversity map, this stop & frisk project and this election tracker? )

WNYC is growing our Data News Team to make high-impact visualizations and projects, and to help WNYC reporters and producers present the facts, expose corruption and explain our world. We've been pioneers in the field of crowdsourcing, data journalism and mapping -- even winning some prestigious awards for our work.

Now we're kicking it up a notch. Like to join us? 

What we have:

  • An award-wining staff of reporters and producers
  • A committed, innovative digital staff
  • A mission to conduct journalism in the public interest
  • Millions of engaged, passionate and active listeners and readers

What you have:

  • A passion for news
  • An attention to detail, a respect for fairness and a hatred of inaccuracy
  • A user-centered approach to exploring information
  • An appreciation for clean lines, clear stories and use of white space
  • A genuine and friendly disposition, and an honest spirit of collaboration
  • A bias toward sharing what you know, and helping others build on it

What you'll do:

  • Huddle with reporters to figure out how we might help their stories with data, design and web technology
  • Work as a team to turn ideas into realities in days or weeks, tops
  • Learn from and build on successes and mistakes along the way
  • Have your work consumed online and talked about on air to millions of New Yorkers

Head over to our official aplication for Interaction Designer and tell us all about you.

The Thinking Behind WNYC's Vertical Timeline

Making a music stand, my father said, was a great challenge: Even though people had made them for centuries, it was still possible to blend beauty and function in a new way.

In journalism, the same is true for the timeline.

Presenting a chronological story online, beautifully and functionally, has been tricky. There are some great examples, such as the New York Times' chronology of the Iraq war, and the three-dimensional Middle East timeline from The Guardian.

ProPublica built the excellent TimelineSetter to put Times-like timelines in the hands of non-Times journalists, and we used it for a while. But TimelineSetter's horizontal layout got cramped in WNYC's article columns, and we longed for something that fit better.

Working with Balance Media and the WNYC web design team, we kicked around several ideas and settled on a vertical version. As it happened, Facebook's new vertical timeline had come out, inspiring a crop of JavaScript libraries we could work with.

We also decided to dispense with a journalistic convention that represented temporal gaps visually -- making months wider than weeks, for example -- and focus, instead, on seeing the sequence of events at once.

The live version at WNYC is here.

We also went with a center-spine orientation, which give it balance and allows the user to see more items at the same time. And the very cool Isotope code reshuffles the items to fit as they are closed, opened, resorted or even added.

Open to use

Finally we wanted it easy for us -- and you -- to use. So we wired it to Google Spreadsheets, allowing reporters and editors to easily enter and update the information. The wiring there is based on a previous project of ours called Tabletop.js.

And we made the source code openly available and scary-easy to use, and you can start by copying this Google spreadsheet template.

We usually build an HTML page just like the one in the code example, and then use a simple line of HTML to iframe it onto an article page. The only trick is to make sure the iframe is tall enough.

The code is free for non-commercial use; commercial use requires a $25 license fee for Isotope.

We hope folks will use the timeline, and come up with improvements. Let us know about either in the comments below or by writing me at john (at) johnkeefe (dot) net.

Making AP Election Data Easy with Fusion Tables

This post is for journalists who use (or would like to use) election data from the Associated Press -- which is a paid service the AP provides. If that describes you, read on!

When Google gave away free, live election data for the Iowa Caucuses, something struck me right away: It was easy.

Data provided by the Associated Press, which drives almost every election site you've ever seen, is notoriously tricky to manage -- a statement I'm confident making based on talks with many election-night veterans and on my own experience.

But Google's results were posted in a public Google Fusion Table, which is basically a spreadsheet on steroids. That meant I could get the data I wanted simply by constructing the correct URL. Votes by county, sorted by county? No problem. Candidate totals for the entire state? Sure. Votes mashed with other data I had? Yup. Formatted in JSON? Bring it.

Instantly. Easily.

(Here are the URLs I used above, and here's the documentation from Google on how to construct them. Hard-to-find tip: Append &jsonCallback=anything to get the json. And if you're using jQuery AJAX calls, make it &jsonCallback=?)

A week later, for the New Hampshire Primary, there were no free Google data. So I made an AP data-fetcher-and-wrangler based on code by Al Shaw. Through no fault of Al's code, my adaptation was slow, complicated and crashed every couple of hours. It worked, but just barely.

Next up was South Carolina, and I was determined to make AP's data friendlier by putting in a Google Fusion Table.

And it worked.

How I did it

In the interest of time and clarity -- and to spark discussion before the primaries are over (or irrelevant) -- I'm leaving out a bunch of the nitpicky details. If you're an AP Elections subscriber and want to try this, contact me at john (at) johnkeefe.net. I'll help you any way I can.

AP provides data in several formats, including a "flat file," which basically is a huge, semicolon-delimited spreadsheet. Each row represents a county, and each column the latest stats for that county, such as precincts reporting and each candidate's total votes.

The flat file doesn't have column headers, though. So I first uploaded AP's South Carolina test table to Google Spreadsheets and added the column names I needed.

I then imported the spreadsheet into a non-public Google Fusion Table.

For election night, I set up a script on my computer that does the following steps every two minutes:

1. Logs into AP's servers via FTP and downloads the flat file.

2. Deletes the data from the Google Fusion Table I made earlier and uploads the entire flat file anew. This is accomplished with a little Python program written by the brilliant (and patient) Kathryn Hurley, of the Google Fusion Tables team. I've posted it here with her permission. I don't know Python, but didn't need to. I just needed to make sure the list of columns in the data_import.py exactly matched the columns in my table. So I cut-and-pasted them from the Google spreadsheet. The script executes the command:

python data_import.py [google account username] [flat file filepath] [fusion table id]

3. Next, it hits the Fusion Table with a simple URL request formatted to return the data I want as JSON. This is the URL I used for getting the county totals.

4. Then it sends that JSON as a file, via FTP, to a subdirectory of my map application on WNYC's servers.

Once a minute, the election map running in the user's browser looks at that data file to get the latest info.

In this way, I completely avoided the need to build and maintain a database. I know there are great database folks out there, but I'm not one of them. The Fusion Table became my database.

Technically, I could skip steps 3 and 4 by simply pointing my map application at the Fusion Table to get the data it needs. That's what I did for Iowa, using the free Google data. But the table would be publicly visisble on Google's servers ... and my reading of the AP contract, understandably, doesn't allow that.

I strongly believe that the easier AP's data is to use, the more budding journocoders will make new election-night interactives. And if we can work together to do that, let's. For me, this method was a lot easier than anything else I had tried before.

A final note: If you're a Python-savvy programmer, be sure to check out what the LA Times has shared to make life easier, too. It's pretty slick.

Screaming for a Map: The New "Littles"

When I saw the NYC ancestry data, I immediately thought, "That screams MAP!"

Brian Lehrer Show producer Jody Avrigan had been working on a great project looking for the new "Littles" in New York City -- neighborhoods where people of a certain ancestry or ethnicity live. He had a spreadsheet; I wanted to visualize it.

The result may be my favorite map project so far:

Mostly, I built on what I'd learned making WNYC's Census Maps, adding a few of things:

• An on-map drop-down menu (here's the CSS code for that).

• Code that selects different data from a single Google Fusion Table

• Panning and zooming to the neighborhood I want to highlight.

• A better "Share or Embed" pop-up box using jquery.alerts.js

I also tried to clean up and refactor my original code to make it easier to read (and reuse).

You can see that code on GitHub. I tried to document it clearly, but post a note below if you have any questions or would like clarification.

UPDATE: In making this map, I used a new (to me) trick to remove the water areas from census tract shapes on the coastline.  Here's how I did it, if you're interested.

A Customized Viewer for DocumentCloud

This post is for newsrooms using DocumentCloud, the fantastic document viewer developed by journalist-programmers at ProPublica and The New York Times.

Want a custom viewer for your site's documents? You can have ours.

I built it so that once set up, this viewer will automagically fill in the title, source and "back-to-article" link based on information already associated with the document -- so one file serves all of your documents.

Here's how.

One-time Setup

You can make this work with a little knowledge of html and access to a web server. You'll need to host a single html page, called dc.html and a tiny javascript file, called jquery.url.min.js.

1. Download the html code for dc.html by right-clicking on this link (or view it here).

2. Use any text editor to edit the path to your logo image on line 101. (A logo that's 60 pixels high works well).

3. On line 101, change "www.wnyc.org" to your site's home page

4. Upload the file dc.html to a web server.

To extract the document info from the URL, the page uses a little JavaScript program called jquery.url.min.js which you can read about here and download here. Once you do:

5. Upload jquery.url.min.js to your web server (the page assumes it's in the js/ subdirectory)

6. If you need to change the location of jquery.url.min.js, edit the path on line 38 of the html code and re-upload.

Using the Viewer

To use the viewer, simply construct a link to it that combines dc.html's location and the ID of the document you want it to load. For example, the base URL for the WNYC's version of dc.html is here:

http://project.wnyc.org/documents/dc.html

And the document I want to display is here:

https://www.documentcloud.org/documents/11275-bill-a11354.html

I combine them into a new link by taking the base URL, adding "?doc=" and then adding the document ID -- which, here, is 11275-bill-a11354 (omitting the .html .) Like this:

http://project.wnyc.org/documents/dc.html?doc=11275-bill-a11354

Voila.

Pages and Annotations

For extra trickiness, you can jump to specific page numbers and annotations by adding references to them into your link. Here you need to append "#document/p" and the page number. So for page 2, you'd use:

http://project.wnyc.org/documents/dc.html?doc=11275-bill-a11354#document/p2

And for the annotation on page 3, it would be:

http://project.wnyc.org/documents/dc.html?doc=11275-bill-a11354#document/p3/a3975

(You get the annotation number -- and the whole phrase after the #, actually -- by clicking on the little "link" icon next to the annotation's title.)

That's it. 

Credits and Disclaimers

The base design is built on code the Chicago Tribune News Apps Team wrote, which I modified with help from the DocumentCloud folks to dynamically take up the title, source and related-story information from the document's metadata.

Note that the version of dc.html at project.wnyc.org contains extra tracking code specific to our servers. The version here does not. It's the one you should download.

And I don't warrant in any way that this is perfect code, so please use at your own risk.

If you modify it -- especially if you improve on what's here -- please let me know and I'll share the updates here and on GitHub.

Making the WNYC Census Map

When the New York census numbers arrived this week, we were ready. WNYC quickly published an interactive, sharable map so New Yorkers -- and our reporters -- could explore the new data and see the stories.

We built the map with free tools and timely help from some smart, kind people.

<p>scrolling="no">

The short story is that we mashed together population numbers and geographic shapes using Google Fusion Tables, and then used JavaScript and Fusion Tables' mapping features to make things pretty and interactive.

The long story is meandering and full of wrong turns. But here are the highlights, should anyone need a little navigation. Don't hesitate to contact me for more help and insight; I'm due to pay some forward.

Getting in Shape

First up: Shapes of the census tracts plotted on Earth. I downloaded New York's tracts from the U.S. Census Bureau's TIGER/Line Shapefile page. They also have counties, blocks, zip codes, and more.

Then I uploaded this "shapefile" -- actually collection of related files zipped together -- to Fusion Tables with a free, online tool called Shpescape. (Thanks to Google's Rebecca Shapley for sharing this key to my puzzle.)

Hello, Data

Census data is publicly available, but can be a hassle to handle. In fact, on the day each state's info was released, the files were available in a set that apparently requires one of two pricey programs -- SAS or Microsoft Access -- to assemble. 

So I got clean, assembled, comma-delimited files -- complete with 2000-to-2010 comparisons -- from the USA Today census team, which provided them as a courtesy to members of Investigative Reporters and Editors. Huge props to Anthony DeBarros and Paul Overberg, who crunched the New York numbers in a blazing 30 minutes.

By the way, IRE membership is $60 for professionals and $25 for students. Well worth it, and cheaper than either of those programs. If you're digging into census numbers and qualify, I recommend this route.

That said, every state's 2010 data is now available free from the Census Bureau's American Fact Finder. Navigating the site is a little tricky, and worth a separate post, but the bureau provides some tutorials, and there's very detailed PDF about each data field.

With data in hand, I uploaded it to Google Fusion Tables in another table.

Map Making

Next, I merged the shapes table and population table, using the unique tract ID to marry the data (the shape file calls it GEOID10. the IRE data calls it FULLTRACT). Note that the GEOID10 is formatted differently depending on whether you're using tracts, blocks, counties, etc., so be sure you've got the right match in both files.

Clicking Visualize -> Map shows a map. It'll be all default-red until you click on Configure Styles -> Fill Colors -> Gradient (or ->Buckets) and make different colors appear depending on values in the column of your choice.

Using the Share button makes the map viewable by others, and "Get embeddable link" does just that.

Adding Prettiness

I used the Fusion Tables "Configure info window" option to make custom pop-up bubbles on our maps. This actually required some nicer-looking data, such as a columns with rounded percentages and + or – signs. I added these using the free R statistical program, which I learned how to do from The New York Times' Amanda Cox at the 2011 Computer Assisted Reporting conference.)

Census tracts officially extend to the state lines, which made it look like a lot of people live in the Hudson River. So I had trimmed those tracts to the shorelines with a free mapping program called QGIS, using water shapefiles as a reference (those are here, in the drop-down menu).

After creating 12 merged Fusion Tables, I pulled them into one page using JavaScript and jQuery, with fantastic guidance from Joe Germuska at the Chicago Tribune (part of the team that built this great map).

The "Share/Embed this view" feature came together in two parts: 1) The JavaScript turns the current the latitude, longitude, zoom level and current map choice into a long URL that pops up when you click the Share/Embed link. 2) Using a nifty jQuery plug-in (updated link Dec. 2011), the map looks for those values in the URL that summoned it, and reorients to that map if they exist.

Prep Work

Clearly, not all of this could happen in a couple of hours on Data Day. I'd been tinkering, testing and tweaking for a few weeks using New Jersey's data, which came out much earlier.

I also wrote down, edited and revised every step I took to make the maps. So when the adrenaline was running I had a script to follow.

The WNYC Web Team also set up a slick, fresh project server, at project.wnyc.org, to host the html pages and track the traffic.

Fusion Function

Using Google Fusion Tables made it super easy to manage, map and serve up a lot of data. And the FT feedback team was fantastic about responding to questions and glitches I encountered along the way.

I did run into a couple of hiccups: slow load times and pop-up bubbles that failed to pop up. The first was a product of displaying so much data -- and I knew I was pushing things. The second was a Google glitch that their engineers managed to fix within a few hours, but was still spotty at times afterward.

Also, the Google Map engine starts dropping shapes when there are too many to show. So I funneled different counties' data into almost a dozen different layers, a workaround the Google folks showed me ahead of time.

That said, I had time to code and tweak lots of neat things because I didn't have to focus on building or running a database engine. Google's free services took care of that.

What Could Be Better?

Probably a lot. I wanted to let people to add comments, right on the map, but didn't have the chops or time to pull that off.

Another good thing would be a "Loading ..." indicator displayed while the map data is pulled into your browser, which I may yet add.

But what couldn't have been better was everything I learned, the help I got from other data folks and the support from my WNYC colleagues. Plus we gave New Yorkers a pretty nifty service and several great stories.

Need more details? Feel free to ask questions in the comments. Or drop me a line. I'll try to help, too.

Where's the Next Bus? I'll Tell You

When New York City released real-time bus info for a Brooklyn line, one of my colleagues wasn't happy.

Yes, she could use a smartphone to see buses on an MTA map. Yes, she could get location information by texting the code for her stop (she lives near the route). But none of this was simple enough.

"I want a phone number that will TELL me when the next bus is coming," she said.

I'll have it for you by the end of the day, I replied.

It was a bit of a gamble. More of a challenge to myself, really. But if "before midnight" counts as the end of the day, then I succeeded. Try it:

Dial 646-480-7193. When prompted, enter 308333 or any of the bus stop codes for the B63 line.

It's not journalism, but it is a working example of how someone can take public data and turn it into a useful tool, quickly. And at almost no cost.

How I Did It

First, I wrote a little program in Sinatra that sends a 6-digit bus stop code to the NYC Metropolitan Transit Authority API -- or application programming interface -- whenever someone hits my program's web address. (The API is a public portal to the live bus data. All you need is a little programmer know-how and a free key from the MTA. The technical details are right here.)

The API sends back 77 lines of information about the stop and the buses approaching it, including the one detail I want:

<StopsFromCall>3</StopsFromCall>

The next bus is three stops away. I use a Ruby tool called Nokogiri to find and extract just this number, which I drop into an amazingly simple web page. The page's entire output looks like this:

The next bus to arrive at 14th Street and Fifth Avenue heading north is 3 stops away.

That accomplished, I bought a phone number from Twilio for $1 a month, and $0.01/call. Twilio provides a telephone connection to web-based programs, which I first heard about during a demo at a TimesOpen event

I set my new phone number to hit my program's URL whenever someone calls. By wrapping the text in special tags, Twilio recognizes it as a cue to talk:

<Response>
<Say>
The next bus to arrive at 14th Street and Fifth Avenue heading north is 3 stops away.
</Say>
</Response> 

Voila.

I later used Twilio's <Gather> tags to build what's essentially a web form to capture digits entered on a phone for any bus stop. I also added error catching, for when no buses are coming, and programmed it to announce the location of the following bus if the next bus is arriving.

Some Hiccups

Turns out that Twilio reads text a little too fast to be understood on an NYC street corner. So I rewrote the output to introduce pauses:

The next bus to arrive at ... 14th Street ... and ... Fifth Avenue ... heading ... north ... is ... 3 ... stops ... away.

Also, there's a bug in the API system that sends back server errors in certain conditions. Word is that the MTA has actually fixed this on their development servers, and that fix is being pushed to the public system pretty soon. 

More to Come

I have a few enhancements up my sleeve, which should be done in a week or two. If you'd like to know when new tricks roll out, drop a note with "Bus Talk" in the subject line to john (at) johnkeefe.net. I'd love to hear your thoughts on how it could work better, too.

I'll also release the source code after those nifty updates. Let me know, too, if you're interested in that.

Photo by jbrau13 on flickr.

Weaving a Patchwork Map ... in Real Time

We did something a little creative and unique at WNYC this past election night: We mapped the vote by "community type."

This revealed the diversity of the vote across New York State -- from the cities to the suburbs, boom towns and "service worker" centers -- in real time, on the air and on the WNYC home page.

And the diversity is striking. Despite Democratic wins in every statewide race, the Republicans running for state attorney general and comptroller "won" every community type outside "Industrial Metropolis" and "Campus & Careers" counties.

Patchwork Nation's Dante Chinni talked about this on air during WNYC's coverage election night, and has written more about it since.

The live map was a mashup of Patchwork Nation's unique take on the nation and the Associated Press's live vote totals. At the request of WNYC, Patchwork Nation programmers dove into the AP test results and quickly wove them into a new map based on PN's existing county maps -- customizing them for the event and adding real-time percentages by community type.

Bringing the Threads Together

In the months before the election, I had wondered how we might better understand the early returns -- those that come in typically between 9 p.m. and 10:30 p.m. -- which often don't match the final results. I wanted more clarity.

At a Hacks/Hackers Open-Source-a-Thon, I started playing with the election data with help from Al Shaw (then at TalkingPointsMemo, now at ProPublica) and Chrys Wu (of Hacks/Hackers and ONA fame).

That evolved into a little program I wrote in Sinatra that generated vote-total map at the left, shading counties darker as more of their precincts reported. It also helped me better understand how the data were structured, how to retrieve the numbers and what it might take to make a live map.

So when Chinni asked if WNYC had any county-level data sets we'd like to put through the Patchwork Nation treatment, I had the perfect candidate.

Open All Night: The Great Urban Hack NYC

For 26 hours this weekend, a bunch of journalists and coders got together to make lots of great things designed to help the citizens of New York City.

My blog post summarizing the event and all of the resulting projects is upon Hacks/Hackers.

I helped Jenny 8. Lee, Chrys Wu and Stephanie Pereira organize the event. Then I joined a team working with digital heaps of NYC taxi trip data to make data visualizations and start some other projects. My favorite one is here (with a detail below), which is a representation of taxi usage for 24 hours, set around a clock. Beautiful. It was built by Zoe Fraade-Blanar using Processing and data crunched by the other teammates.

Click image for full view

I came away from the event with many new connections, excitement about learning Processing, some more skills in Sinatra and a note to check out Bees with Machine Guns(!)