Once Upon a Datum: Mapmaking on News Time

In September, I shared how WNYC makes news maps during a talk at the the Online News Association conference.

UPDATE: ONA posted a video of this presentation, which I've embedded here:

'Once Upon a Datum': Telling Visual Stories from Online News Association on Vimeo.

Here are my presentation slides (PDF), and here's a list of links to pages and sites I discussed in my talk:

Same-Sex Couples in NYC: http://www.wnyc.org/articles/wnyc-news/2011/jul/14/census-shows-rising-number-gay-couples-and-dominicans/

Hispanic Origins in NYC: http://www.wnyc.org/articles/wnyc-news/2011/jul/14/census-shows-rising-number-gay-couples-and-dominicans/

The New Littles: http://www.wnyc.org/shows/bl/clusters/2011/jun/02/june-guest-andrew-beveridge-and-new-littles/

Marijuana Arrests: http://www.wnyc.org/articles/wnyc-news/2011/apr/27/alleged-illegal-searches/

Contributions by Zip Code: http://empire.wnyc.org/2011/07/where-are-the-mayoral-candidates-raising-their-money/

Dollars in a District: http://empire.wnyc.org/2011/09/the-54th-assembly-campaign-contribution-breakdown-where-have-all-the-in-district-donors-gone/

NYC Hurricane Evacuation Map: http://wny.cc/EvacZones

NYC DataMine: http://www.nyc.gov/html/datamine/html/data/geographic.shtml

Shpescape: http://www.shpescape.com/

Hurricane Zones Fusion Table: http://www.google.com/fusiontables/DataSource?dsrcid=964884

Fusion Tables Layer Builder: http://j.mp/FusionBuilder or http://gmaps-samples.googlecode.com/svn/trunk/fusiontables/fusiontableslayer_builder.html

Layer-wizard map from presentation: http://dl.dropbox.com/u/466610/preso-map.html

 


Mapping Dollars in a District

I loved this challenge.

WNYC's Colby Hamilton wanted to know: How much money was being raised by candidates for a state legislative district from within the district itself?

Answer: Very little.

Making this Map

This wasn't my typical "just upload it to Fusion Tables" project. It got geeky quickly, intentionally.

My method involved a PostgreSQL / PostGIS database and QGIS mapping software. Everything is free, which is amazing, yet they take some advanced tinkering -- especially the database stuff.

First, I geocoded the donation addresses, getting each one's latitude and longitude, using this nifty batch geocoder. The donor's name and donation amount were also on each line.

Then I fed the data into my PostgreSQL database and pulled it into QGIS (they talk nicely together). I also layered in a shapefile of the district from the US Census Bureau.

I then asked QGIS where the donations and the district "intersect" -- and spit out the resulting shapefile for each candidate. 

Next I uploaded each candidates' "intersection" shapefile and their all-donations shapefile to Google Fusion Tables using shpescape. Once there, I used Fusion Tables' aggregation feature to total the donations in the district (the intersection).

Fusion Tables also allowed me me plot all of the donations, and also the shape of the district. (Little trick: I actually copied the "geography" cell from the 54th District table and added it as a new row to the donations table. That way the donations and the district shape appear at the same time.)

Finally, I put the layers together into a map template I've grown since building 2010 Census maps.

You'll notice I'm not diving deep into the details here, but if you're looking into a similar project, drop me a note at john at johnkeefe dot net look at this page, where I share every tidbit, command and SQL "select" statement I used.

Coulda Just Used Fusion Tables

The truth is, I could have used only Fusion Tables. The number of donations within the district turned out to be so small -- 69 in total -- I could have simply uploaded the donations into Fusion Tables, letting it do the geocoding and the drawing of points and the district shape.

Then it's just a matter of clicking on every dot within the pink lines, adding up the donations in each bubble along the way.

Instead, I've created a process to do more complicated inside-an-area calculations. And to help others do them, too.

Making the NYC Evacuation Map

A couple of years ago, I had our WNYC engineers use a plotter to print out this huge evacuation map PDF. Seemed like a good thing for the disaster-planning file. Just in case.

Then, back in June of this year, I was browsing the NYC DataMine (like you do), and realized New York City had posted a shapefile for the colored zones on that map.

UPDATE (Feb. 11, 2012): NYC has nicely revamped the DataMine since the summer Irene struck -- even mapping geographic files like this right in the browser. But it's actually tricker to find the shapefiles now. Here's the hurricane zones dataset. Click "About" and scroll down to "Attachments" for the .zip file containing the shapefiles. Or just use this shortcut.

I knew I could use the shapefile to make a zoomable Google map -- which would be a heckuvalot easier to use than the PDF. So I imported the shapefile into a Google fusion table. (It's super easy to do; check out this step-by-step guide.) Next, I added that table as a layer in a Google Map and tacked on an address finder I'd developed for WNYC's census maps.

Then I tucked the code away on my computer. Just in case.

Fast-forward to Thursday morning, as Irene approached. On the subway in to work, I polished the map and added a color key. It was up on WNYC.org by midmorning, long before the Mayor ordered an evacuation of Zone A.

When the order was announced, I used another fusion table to add evacuation center locations, updating that list with info from New York City's Chief Digital Officer Rachel Sterne. (The dots are gone now, since the sites are closed.)

I'm not at liberty to reveal traffic numbers, but the site where we host our maps received, um, a lot more views than it usually does. By orders of magnitude. Huge props to the WNYC.org digital team for keeping the servers alive.

Tracking a Hurricane

As Hurricane Irene was approaching Puerto Rico, I noticed that the National Hurricane Center posts mapping layers for each element in their storm-track forecast maps.

Since their zoomable maps aren't embeddable, I made one that is. Feel free to use it:

Right now, I'm manually updating the map with new layers as they are issued, every three hours. I'm pretty close to having a script ready to handle that for me, based on information in the storm's RSS feed.

In the process of building this map, I learned how to use "listeners" to dictate the order the layers are rendered. For anyone trying to work that out, here's the code for how I did it.

Mapping Campaign Contributions on the Fly

Our new Empire Blog reporter, Colby Hamilton, dropped by my desk the other day wondering whether we could map contributions to presumptive NYC mayoral candidates by zip code.

He was going to post about it after lunch. I said I'd be ready with a map.

Thanks to Fusion Tables and a little Ruby magic, I had one ready when his story was done shortly after lunch, and we updated it into the evening as the candidates' filings were made available by the NYC Campaign Finance Board.

How I Did It

For anyone looking to do something similar, here's what I did:

-- I downloaded each candidate's donation as an Excel spreadsheet from the homepage of the Campaign Finance Board.

-- I uploaded the spreadsheet to Google Fusion Tables (if it's an Excel file more than 1MB, you have to save it as comma-separated-values, or .CSV, before uploading).

-- I used Fusion Tables' fantastic aggregattion function -- View -> Aggregate -- to sum the contributions by zip code. Then I exported that file as a .CSV, which gave me a file for each candidate with the columns: ZIP, SUM and COUNT -- SUM being the total donations and COUNT the number of donations for the zip code.

-- I re-uploaded that aggregation export back to Fusion Tables. (If anyone knows how to save an aggregation in FT without exporting it and uploading it again, I'm all ears.)

-- Now that I had the contributions by zip code, I need the zip code shapes to go with them. The US Census has zip code shapefiles by state FIPS code, and for the entire United States. (Quick note: Census zip code data and US Postal Service zip codes aren't exactly the same, though we felt comfortable using the Census version for this project.)

-- I uploaded the New York state zip code shapefile to Fusion Tables, too, using Shpescape. (If you're working with New York State, you can save some work and just use mine.)

-- I opened the ZIP-SUM-COUNT file in Fusion Tables and merged it with the zip code shapefile, linking them with the ZIP field in the first file and ZTCA5CE10 in the second file.

-- Using Visualize -> Map, I could see all of the relevant zip codes for that candidate. By using the Configure Styles link, and then tinkering with Fill Color -> Buckets settings, I shaded the map according to total donations.

This map is ready to be embeded!

The Trouble with Tables

An admission, though: I didn't use the Fusion Tables embeddable map for this story. I did share the FT map with Colby, which let us know we had a couple of good stories. FT is great and fast for that. It also works in production with smaller data sets.

But the long time it takes Fusion Tables to populate a map with large data sets can make for a frustrating user experience. That's compounded by the fact there's no way (yet) to "listen" for a sign that the layer has fully loaded, which would let me display a "Please wait ..." sign until it did.

So in this case, I built my own KML, or Keyhole Markup Language, file (5 of them, actually; one for each candidate). I then compressed those files in to much smaller KMZ files, which are just "zipped" KML files, so they load quickly. I then used those files as layers with Google Maps' KmlLayer() constructor. I also used a "listener" to find out when the layer is fully loaded, and display an alert to the user until it is.

More to Come

As for how I built the KML file, I'm going to share that in another post once I clean up the Ruby code I used to automate the process. (If your project can't wait for that post, drop me a note and I'll try to help.)

But the basics are these:

1. The layout of a KML file, and the format for using different styles to color different shapes, is pretty straightforward and nicely documented. In my code, I changed the style name for a given shape based on the value of the "SUM" variable.

2. The hardest part of writing a KML file is defining each shape in the proper format. But the merged file I made linking the ZIP-SUM-COUNT data and the shapefile actually has that information! The "geometry" column of that table is straight KML! (Thank you, Shpescape.) Export that merged file as a .CSV, and you've got all of the building blocks for your map.

If you have ideas, improvements or questions about this post, please don't hesitate to drop a note in the comments or drop me a note via email.

Screaming for a Map: The New "Littles"

When I saw the NYC ancestry data, I immediately thought, "That screams MAP!"

Brian Lehrer Show producer Jody Avrigan had been working on a great project looking for the new "Littles" in New York City -- neighborhoods where people of a certain ancestry or ethnicity live. He had a spreadsheet; I wanted to visualize it.

The result may be my favorite map project so far:

Mostly, I built on what I'd learned making WNYC's Census Maps, adding a few of things:

• An on-map drop-down menu (here's the CSS code for that).

• Code that selects different data from a single Google Fusion Table

• Panning and zooming to the neighborhood I want to highlight.

• A better "Share or Embed" pop-up box using jquery.alerts.js

I also tried to clean up and refactor my original code to make it easier to read (and reuse).

You can see that code on GitHub. I tried to document it clearly, but post a note below if you have any questions or would like clarification.

UPDATE: In making this map, I used a new (to me) trick to remove the water areas from census tract shapes on the coastline.  Here's how I did it, if you're interested.

Making the WNYC Census Map

When the New York census numbers arrived this week, we were ready. WNYC quickly published an interactive, sharable map so New Yorkers -- and our reporters -- could explore the new data and see the stories.

We built the map with free tools and timely help from some smart, kind people.

<p>scrolling="no">

The short story is that we mashed together population numbers and geographic shapes using Google Fusion Tables, and then used JavaScript and Fusion Tables' mapping features to make things pretty and interactive.

The long story is meandering and full of wrong turns. But here are the highlights, should anyone need a little navigation. Don't hesitate to contact me for more help and insight; I'm due to pay some forward.

Getting in Shape

First up: Shapes of the census tracts plotted on Earth. I downloaded New York's tracts from the U.S. Census Bureau's TIGER/Line Shapefile page. They also have counties, blocks, zip codes, and more.

Then I uploaded this "shapefile" -- actually collection of related files zipped together -- to Fusion Tables with a free, online tool called Shpescape. (Thanks to Google's Rebecca Shapley for sharing this key to my puzzle.)

Hello, Data

Census data is publicly available, but can be a hassle to handle. In fact, on the day each state's info was released, the files were available in a set that apparently requires one of two pricey programs -- SAS or Microsoft Access -- to assemble. 

So I got clean, assembled, comma-delimited files -- complete with 2000-to-2010 comparisons -- from the USA Today census team, which provided them as a courtesy to members of Investigative Reporters and Editors. Huge props to Anthony DeBarros and Paul Overberg, who crunched the New York numbers in a blazing 30 minutes.

By the way, IRE membership is $60 for professionals and $25 for students. Well worth it, and cheaper than either of those programs. If you're digging into census numbers and qualify, I recommend this route.

That said, every state's 2010 data is now available free from the Census Bureau's American Fact Finder. Navigating the site is a little tricky, and worth a separate post, but the bureau provides some tutorials, and there's very detailed PDF about each data field.

With data in hand, I uploaded it to Google Fusion Tables in another table.

Map Making

Next, I merged the shapes table and population table, using the unique tract ID to marry the data (the shape file calls it GEOID10. the IRE data calls it FULLTRACT). Note that the GEOID10 is formatted differently depending on whether you're using tracts, blocks, counties, etc., so be sure you've got the right match in both files.

Clicking Visualize -> Map shows a map. It'll be all default-red until you click on Configure Styles -> Fill Colors -> Gradient (or ->Buckets) and make different colors appear depending on values in the column of your choice.

Using the Share button makes the map viewable by others, and "Get embeddable link" does just that.

Adding Prettiness

I used the Fusion Tables "Configure info window" option to make custom pop-up bubbles on our maps. This actually required some nicer-looking data, such as a columns with rounded percentages and + or – signs. I added these using the free R statistical program, which I learned how to do from The New York Times' Amanda Cox at the 2011 Computer Assisted Reporting conference.)

Census tracts officially extend to the state lines, which made it look like a lot of people live in the Hudson River. So I had trimmed those tracts to the shorelines with a free mapping program called QGIS, using water shapefiles as a reference (those are here, in the drop-down menu).

After creating 12 merged Fusion Tables, I pulled them into one page using JavaScript and jQuery, with fantastic guidance from Joe Germuska at the Chicago Tribune (part of the team that built this great map).

The "Share/Embed this view" feature came together in two parts: 1) The JavaScript turns the current the latitude, longitude, zoom level and current map choice into a long URL that pops up when you click the Share/Embed link. 2) Using a nifty jQuery plug-in (updated link Dec. 2011), the map looks for those values in the URL that summoned it, and reorients to that map if they exist.

Prep Work

Clearly, not all of this could happen in a couple of hours on Data Day. I'd been tinkering, testing and tweaking for a few weeks using New Jersey's data, which came out much earlier.

I also wrote down, edited and revised every step I took to make the maps. So when the adrenaline was running I had a script to follow.

The WNYC Web Team also set up a slick, fresh project server, at project.wnyc.org, to host the html pages and track the traffic.

Fusion Function

Using Google Fusion Tables made it super easy to manage, map and serve up a lot of data. And the FT feedback team was fantastic about responding to questions and glitches I encountered along the way.

I did run into a couple of hiccups: slow load times and pop-up bubbles that failed to pop up. The first was a product of displaying so much data -- and I knew I was pushing things. The second was a Google glitch that their engineers managed to fix within a few hours, but was still spotty at times afterward.

Also, the Google Map engine starts dropping shapes when there are too many to show. So I funneled different counties' data into almost a dozen different layers, a workaround the Google folks showed me ahead of time.

That said, I had time to code and tweak lots of neat things because I didn't have to focus on building or running a database engine. Google's free services took care of that.

What Could Be Better?

Probably a lot. I wanted to let people to add comments, right on the map, but didn't have the chops or time to pull that off.

Another good thing would be a "Loading ..." indicator displayed while the map data is pulled into your browser, which I may yet add.

But what couldn't have been better was everything I learned, the help I got from other data folks and the support from my WNYC colleagues. Plus we gave New Yorkers a pretty nifty service and several great stories.

Need more details? Feel free to ask questions in the comments. Or drop me a line. I'll try to help, too.

Fast, Little Maps with Fusion Tables

Google Fusion Tables can handle huge amounts of data -- and seem designed for that. But a great little secret is that they're fantastic for making fast maps. Even little ones.

And it's surprisingly easy.

At WNYC, we used fusion tables for this quickie map of 63 taxi relief stands. My colleague Jim Colgan whipped together these plowed-streets maps (including the one below) from listeners' texted-in reports -- while he was sick in bed!

Some reasons we've been drawn to mapping with Google Fusion Tables:

Simple uploads. All you need is a comma-separated table (csv) or a spreadsheet made in Excel or Google Docs. Each "point" goes on a row. If you have even basic Excel skills, you're more than ready to go.

Embedded geocoding. Put addresses in one of your columns, and Google will geocode them for you -- doing the work of finding the latitude and longitude for your pin. If you already have the coordinates, that's fine, too. Here's the help page on this for more.

Customizable icons. You can designate one of your columns as the icon column, and use this map of available icons to pick names to put in that column for each point There are some really clear instructions for this

Custom popups. You can define what appears in a pin's pop-up bubble. Doing this is a little tricky, but just. In the "map" visualization, click on "Configure Info Window." I find the default templates confusing, so I choose "Custom" from the drop-down menu. You can then use text, html and the table info {in_curly_brackets} to craft a custom bubble.

Easy embeds. Zoom and position the map as you like it and then click the "Get Link" button for a link to what you see. Or click the blue "Get embeddable link" link to get the embed code. (Design note to Google folks: It's confusing that one of these is a button and one is an html link!)

Easy updates. You can add more data points easily, either with additional uploads or just typing your additions or fixes in your browser. 

Privacy controls. As with other Google products, you can click the "Share" button to control who can view and/or edit each table and map -- which is really nice for working in teams.

News maps on news time.  That's been working for us.

Update: Jim Colgan, who put together the snowplow map, talks about how he did it with the folks at Mobile Commons, who run the platform we use for texting projects.

Weaving a Patchwork Map ... in Real Time

We did something a little creative and unique at WNYC this past election night: We mapped the vote by "community type."

This revealed the diversity of the vote across New York State -- from the cities to the suburbs, boom towns and "service worker" centers -- in real time, on the air and on the WNYC home page.

And the diversity is striking. Despite Democratic wins in every statewide race, the Republicans running for state attorney general and comptroller "won" every community type outside "Industrial Metropolis" and "Campus & Careers" counties.

Patchwork Nation's Dante Chinni talked about this on air during WNYC's coverage election night, and has written more about it since.

The live map was a mashup of Patchwork Nation's unique take on the nation and the Associated Press's live vote totals. At the request of WNYC, Patchwork Nation programmers dove into the AP test results and quickly wove them into a new map based on PN's existing county maps -- customizing them for the event and adding real-time percentages by community type.

Bringing the Threads Together

In the months before the election, I had wondered how we might better understand the early returns -- those that come in typically between 9 p.m. and 10:30 p.m. -- which often don't match the final results. I wanted more clarity.

At a Hacks/Hackers Open-Source-a-Thon, I started playing with the election data with help from Al Shaw (then at TalkingPointsMemo, now at ProPublica) and Chrys Wu (of Hacks/Hackers and ONA fame).

That evolved into a little program I wrote in Sinatra that generated vote-total map at the left, shading counties darker as more of their precincts reported. It also helped me better understand how the data were structured, how to retrieve the numbers and what it might take to make a live map.

So when Chinni asked if WNYC had any county-level data sets we'd like to put through the Patchwork Nation treatment, I had the perfect candidate.