When the New York census numbers arrived this week, we were ready. WNYC quickly published an interactive, sharable map so New Yorkers -- and our reporters -- could explore the new data and see the stories.
We built the map with free tools and timely help from some smart, kind people.
The long story is meandering and full of wrong turns. But here are the highlights, should anyone need a little navigation. Don't hesitate to contact me for more help and insight; I'm due to pay some forward.
Getting in Shape
First up: Shapes of the census tracts plotted on Earth. I downloaded New York's tracts from the U.S. Census Bureau's TIGER/Line Shapefile page. They also have counties, blocks, zip codes, and more.
Then I uploaded this "shapefile" -- actually collection of related files zipped together -- to Fusion Tables with a free, online tool called Shpescape. (Thanks to Google's Rebecca Shapley for sharing this key to my puzzle.)
Census data is publicly available, but can be a hassle to handle. In fact, on the day each state's info was released, the files were available in a set that apparently requires one of two pricey programs -- SAS or Microsoft Access -- to assemble.
So I got clean, assembled, comma-delimited files -- complete with 2000-to-2010 comparisons -- from the USA Today census team, which provided them as a courtesy to members of Investigative Reporters and Editors. Huge props to Anthony DeBarros and Paul Overberg, who crunched the New York numbers in a blazing 30 minutes.
By the way, IRE membership is $60 for professionals and $25 for students. Well worth it, and cheaper than either of those programs. If you're digging into census numbers and qualify, I recommend this route.
That said, every state's 2010 data is now available free from the Census Bureau's American Fact Finder. Navigating the site is a little tricky, and worth a separate post, but the bureau provides some tutorials, and there's very detailed PDF about each data field.
With data in hand, I uploaded it to Google Fusion Tables in another table.
Next, I merged the shapes table and population table, using the unique tract ID to marry the data (the shape file calls it GEOID10. the IRE data calls it FULLTRACT). Note that the GEOID10 is formatted differently depending on whether you're using tracts, blocks, counties, etc., so be sure you've got the right match in both files.
Clicking Visualize -> Map shows a map. It'll be all default-red until you click on Configure Styles -> Fill Colors -> Gradient (or ->Buckets) and make different colors appear depending on values in the column of your choice.
Using the Share button makes the map viewable by others, and "Get embeddable link" does just that.
I used the Fusion Tables "Configure info window" option to make custom pop-up bubbles on our maps. This actually required some nicer-looking data, such as a columns with rounded percentages and + or – signs. I added these using the free R statistical program, which I learned how to do from The New York Times' Amanda Cox at the 2011 Computer Assisted Reporting conference.)
Census tracts officially extend to the state lines, which made it look like a lot of people live in the Hudson River. So I had trimmed those tracts to the shorelines with a free mapping program called QGIS, using water shapefiles as a reference (those are here, in the drop-down menu).
Clearly, not all of this could happen in a couple of hours on Data Day. I'd been tinkering, testing and tweaking for a few weeks using New Jersey's data, which came out much earlier.
I also wrote down, edited and revised every step I took to make the maps. So when the adrenaline was running I had a script to follow.
The WNYC Web Team also set up a slick, fresh project server, at project.wnyc.org, to host the html pages and track the traffic.
Using Google Fusion Tables made it super easy to manage, map and serve up a lot of data. And the FT feedback team was fantastic about responding to questions and glitches I encountered along the way.
I did run into a couple of hiccups: slow load times and pop-up bubbles that failed to pop up. The first was a product of displaying so much data -- and I knew I was pushing things. The second was a Google glitch that their engineers managed to fix within a few hours, but was still spotty at times afterward.
Also, the Google Map engine starts dropping shapes when there are too many to show. So I funneled different counties' data into almost a dozen different layers, a workaround the Google folks showed me ahead of time.
That said, I had time to code and tweak lots of neat things because I didn't have to focus on building or running a database engine. Google's free services took care of that.
What Could Be Better?
Probably a lot. I wanted to let people to add comments, right on the map, but didn't have the chops or time to pull that off.
Another good thing would be a "Loading ..." indicator displayed while the map data is pulled into your browser, which I may yet add.
But what couldn't have been better was everything I learned, the help I got from other data folks and the support from my WNYC colleagues. Plus we gave New Yorkers a pretty nifty service and several great stories.
Need more details? Feel free to ask questions in the comments. Or drop me a line. I'll try to help, too.