Water Begone

Thousands of people live in the Hudson River.

That's what you'd think, at least, by looking at U.S. Census tract maps for New York City, because census tracts extend to the state line.

But a population map drawn like this isn't attractive, and isn't accurate, either. It suggests inhabited areas at the coast are far larger than actually they are.

So what's a journo-mapper to do?

Fortunately, the Census Bureau also publishes shapefiles of all of the water in the U.S. So one solution is to tell you trusty computer to subtract the water areas from the tracts -- and the difference will be the parts on land.

Doing this turns out to be far easier than I expected. (Thanks to Michael Corey and Nathan Woodrow who responded to my help tweet.) Here's what I did:

1. Opened my census tract shapefile with QGIS (a free, open-source mapping program I'm getting to know).

2. Found the water shapefile for Manhattan (New York County) and opened it as a new layer in QGIS.

3. From the QGIS menu, selected Vector -> Geoprocessing Tools -> Symetrical Difference and followed the prompts, choosing the tract shapefile as the "Input vector layer" and the water shapefile as the "Difference layer."

4. Compressed the resulting shapefile set into one .zip file and uploaded it to Google Fusion tables using shpescape. Once in Fusion Tables, I can play with it, view the map and merge it with population data.


A few extra notes and tips for those trying this at home:

- I've found water shapefiles only for individual counties, which makes for a small pain to do an entire state. For New York City, which is five counties, I loaded the five water shapefiles into QGIS, made sure they were all visible, and used Layer -> Save Selection as Vector File... to save them all as one shapefile. I then used the resulting shapefile in the Symetrical Difference process.

- Be sure the water map represents the same census year as the tract map (and, of course, your data). Very likely you'll be using shapefiles for the previous decennial census. For our New Littles map we had 2009 data, so we used the appropriate tract and water shapes -- which were from the 2000 census.

- I get an error about missing coordinate information when I do step 3, but it hasn't caused me any problems I know of. Also, on my Mac version of QGIS, the Symetirical Difference window and the file-saving dialog box conflict -- but I just moved them to their own side of the screen.

- Census tracts are made up of census "blocks," which are smaller and generally DO follow coastlines. So if you're mapping blocks, you can eliminate the watery ones by excluding any block with an "area land" of zero.

- The difference trick doesn't change the meta-data associated with each tract, which generally is a good thing. 

If you have any questions or suggestions, don't hesitate to post them in a comment below!

Screaming for a Map: The New "Littles"

When I saw the NYC ancestry data, I immediately thought, "That screams MAP!"

Brian Lehrer Show producer Jody Avrigan had been working on a great project looking for the new "Littles" in New York City -- neighborhoods where people of a certain ancestry or ethnicity live. He had a spreadsheet; I wanted to visualize it.

The result may be my favorite map project so far:

Mostly, I built on what I'd learned making WNYC's Census Maps, adding a few of things:

• An on-map drop-down menu (here's the CSS code for that).

• Code that selects different data from a single Google Fusion Table

• Panning and zooming to the neighborhood I want to highlight.

• A better "Share or Embed" pop-up box using jquery.alerts.js

I also tried to clean up and refactor my original code to make it easier to read (and reuse).

You can see that code on GitHub. I tried to document it clearly, but post a note below if you have any questions or would like clarification.

UPDATE: In making this map, I used a new (to me) trick to remove the water areas from census tract shapes on the coastline.  Here's how I did it, if you're interested.

911 vs Google Maps

Two weeks ago today, I called 911. It was an unsettling experience.

Walking by Inwood Hill Park in northern Manhattan, my wife spotted plumes of smoke rising through the trees. There was a fire in the woods, and it was growing.

My call to 911 started at 3:14 p.m. and lasted 3 minutes, according to my iPhone's log. Astonishingly, the operator spent almost all of that time -- probably 2.5 minutes -- trying to find my location on her computer.

Later, using the same information, I did it in 16 seconds. That's the time it takes to type "maps.google.com" into a browser and then "seaman avenue and 214th street nyc."

911 = 150 seconds.

Google Maps = 16 seconds.

Now, this is not a journalistic exploration of why it took so long for the operator to locate me. It is merely my experience. But it's startling enough that I think it is worth a careful recounting. It seems New York City doesn't release 911 calls as a matter of course, though I hope to get mine for a precise transcript of what happened. But the night of the call, I did my best to write down what happened:

• When the operator first answered, I said there's a fire in the woods "in Inwood Hill Park at Seaman Avenue and 214th Street." The park is big, and the fire was across a baseball field in the woods, but it was visible from I was standing and two entrances are nearby. So where I was standing seemed a good location to report.

• The operator asked me if I meant East 214th Street, and I said no, West 214th Street. (For what it's worth, Seaman Avenue doesn't cross East 214th Street.)

• The operator said she couldn't pull up that intersection, eventually asking me if she had spelled Seaman Avenue correctly: S-e-a-m-a-n. Yes, I said, that's right.

• She said to me again that it "wasn't coming up" but kept trying.

• I suggested another cross street, Isham Street, and she said, "In the Bronx?" No, I said surprised, Manhattan.

• That fixed it ... she was able to find my location.

• She then asked me to hold while she connected me to another operator. After several rings, she verbally conveyed my information to the second operator, mistakenly saying "the Bronx" -- which I corrected as she caught herself, "Manhattan!"

Three minutes.

A fire engine arrived a short time later and quickly got the fire under control.

Here are several related searches on Google Maps, all of which return results in less than a second:

"inwood hill park nyc" returns a pin on the west side of the park -- which isn't where the fire was. A fire truck going there would have been misdirected. But it's clearly in Manhattan. And the resulting map would have been a good starting place to work with me to pinpoint my location: "OK, I see Seaman Avenue running along the park ... were exactly from there?"

"214th street and seaman avenue nyc" returns a pin exactly where I was standing. No question about Manhattan or the Bronx.

"seaman avenue and east 214th street nyc" does the same thing, correcting to West 214th Street.

"seaman avenue bronx" returns a pin at 207th Street and Seaman Avenue, correctly in Manhattan, near the entrance to the park -- and, in this case, in sight of the smoke.

WNYC has done some good reporting on 911, but we never had such a concrete example of address confusion. I wonder if other people are having the same problem.

A Customized Viewer for DocumentCloud

This post is for newsrooms using DocumentCloud, the fantastic document viewer developed by journalist-programmers at ProPublica and The New York Times.

Want a custom viewer for your site's documents? You can have ours.

I built it so that once set up, this viewer will automagically fill in the title, source and "back-to-article" link based on information already associated with the document -- so one file serves all of your documents.

Here's how.

One-time Setup

You can make this work with a little knowledge of html and access to a web server. You'll need to host a single html page, called dc.html and a tiny javascript file, called jquery.url.min.js.

1. Download the html code for dc.html by right-clicking on this link (or view it here).

2. Use any text editor to edit the path to your logo image on line 101. (A logo that's 60 pixels high works well).

3. On line 101, change "www.wnyc.org" to your site's home page

4. Upload the file dc.html to a web server.

To extract the document info from the URL, the page uses a little JavaScript program called jquery.url.min.js which you can read about here and download here. Once you do:

5. Upload jquery.url.min.js to your web server (the page assumes it's in the js/ subdirectory)

6. If you need to change the location of jquery.url.min.js, edit the path on line 38 of the html code and re-upload.

Using the Viewer

To use the viewer, simply construct a link to it that combines dc.html's location and the ID of the document you want it to load. For example, the base URL for the WNYC's version of dc.html is here:


And the document I want to display is here:


I combine them into a new link by taking the base URL, adding "?doc=" and then adding the document ID -- which, here, is 11275-bill-a11354 (omitting the .html .) Like this:



Pages and Annotations

For extra trickiness, you can jump to specific page numbers and annotations by adding references to them into your link. Here you need to append "#document/p" and the page number. So for page 2, you'd use:


And for the annotation on page 3, it would be:


(You get the annotation number -- and the whole phrase after the #, actually -- by clicking on the little "link" icon next to the annotation's title.)

That's it. 

Credits and Disclaimers

The base design is built on code the Chicago Tribune News Apps Team wrote, which I modified with help from the DocumentCloud folks to dynamically take up the title, source and related-story information from the document's metadata.

Note that the version of dc.html at project.wnyc.org contains extra tracking code specific to our servers. The version here does not. It's the one you should download.

And I don't warrant in any way that this is perfect code, so please use at your own risk.

If you modify it -- especially if you improve on what's here -- please let me know and I'll share the updates here and on GitHub.

Snow & Ice Violation Map

WNYC's Ilya Marritz has a story today about how violations for unshoveled sidewalks reveals what may be the most neglected block in New York City.

Ilya zeroed in on that block because once he got the records from the city, we sorted them using Google Refine.

Tonight I got all of the data on the map in a way I finally like. I started using Google Fusion Tables, but was unhappy with how slow and erratically the map loaded.

Instead, I batch-geocoded the entire data set using this tool and then wrote a little Ruby program that uses a gem called kamel to generate a KML file. I tinkered with the icons and then zipped it into a tiny KMZ file -- just 56K -- which loads in a flash.

Making the WNYC Census Map

When the New York census numbers arrived this week, we were ready. WNYC quickly published an interactive, sharable map so New Yorkers -- and our reporters -- could explore the new data and see the stories.

We built the map with free tools and timely help from some smart, kind people.


The short story is that we mashed together population numbers and geographic shapes using Google Fusion Tables, and then used JavaScript and Fusion Tables' mapping features to make things pretty and interactive.

The long story is meandering and full of wrong turns. But here are the highlights, should anyone need a little navigation. Don't hesitate to contact me for more help and insight; I'm due to pay some forward.

Getting in Shape

First up: Shapes of the census tracts plotted on Earth. I downloaded New York's tracts from the U.S. Census Bureau's TIGER/Line Shapefile page. They also have counties, blocks, zip codes, and more.

Then I uploaded this "shapefile" -- actually collection of related files zipped together -- to Fusion Tables with a free, online tool called Shpescape. (Thanks to Google's Rebecca Shapley for sharing this key to my puzzle.)

Hello, Data

Census data is publicly available, but can be a hassle to handle. In fact, on the day each state's info was released, the files were available in a set that apparently requires one of two pricey programs -- SAS or Microsoft Access -- to assemble. 

So I got clean, assembled, comma-delimited files -- complete with 2000-to-2010 comparisons -- from the USA Today census team, which provided them as a courtesy to members of Investigative Reporters and Editors. Huge props to Anthony DeBarros and Paul Overberg, who crunched the New York numbers in a blazing 30 minutes.

By the way, IRE membership is $60 for professionals and $25 for students. Well worth it, and cheaper than either of those programs. If you're digging into census numbers and qualify, I recommend this route.

That said, every state's 2010 data is now available free from the Census Bureau's American Fact Finder. Navigating the site is a little tricky, and worth a separate post, but the bureau provides some tutorials, and there's very detailed PDF about each data field.

With data in hand, I uploaded it to Google Fusion Tables in another table.

Map Making

Next, I merged the shapes table and population table, using the unique tract ID to marry the data (the shape file calls it GEOID10. the IRE data calls it FULLTRACT). Note that the GEOID10 is formatted differently depending on whether you're using tracts, blocks, counties, etc., so be sure you've got the right match in both files.

Clicking Visualize -> Map shows a map. It'll be all default-red until you click on Configure Styles -> Fill Colors -> Gradient (or ->Buckets) and make different colors appear depending on values in the column of your choice.

Using the Share button makes the map viewable by others, and "Get embeddable link" does just that.

Adding Prettiness

I used the Fusion Tables "Configure info window" option to make custom pop-up bubbles on our maps. This actually required some nicer-looking data, such as a columns with rounded percentages and + or – signs. I added these using the free R statistical program, which I learned how to do from The New York Times' Amanda Cox at the 2011 Computer Assisted Reporting conference.)

Census tracts officially extend to the state lines, which made it look like a lot of people live in the Hudson River. So I had trimmed those tracts to the shorelines with a free mapping program called QGIS, using water shapefiles as a reference (those are here, in the drop-down menu).

After creating 12 merged Fusion Tables, I pulled them into one page using JavaScript and jQuery, with fantastic guidance from Joe Germuska at the Chicago Tribune (part of the team that built this great map).

The "Share/Embed this view" feature came together in two parts: 1) The JavaScript turns the current the latitude, longitude, zoom level and current map choice into a long URL that pops up when you click the Share/Embed link. 2) Using a nifty jQuery plug-in (updated link Dec. 2011), the map looks for those values in the URL that summoned it, and reorients to that map if they exist.

Prep Work

Clearly, not all of this could happen in a couple of hours on Data Day. I'd been tinkering, testing and tweaking for a few weeks using New Jersey's data, which came out much earlier.

I also wrote down, edited and revised every step I took to make the maps. So when the adrenaline was running I had a script to follow.

The WNYC Web Team also set up a slick, fresh project server, at project.wnyc.org, to host the html pages and track the traffic.

Fusion Function

Using Google Fusion Tables made it super easy to manage, map and serve up a lot of data. And the FT feedback team was fantastic about responding to questions and glitches I encountered along the way.

I did run into a couple of hiccups: slow load times and pop-up bubbles that failed to pop up. The first was a product of displaying so much data -- and I knew I was pushing things. The second was a Google glitch that their engineers managed to fix within a few hours, but was still spotty at times afterward.

Also, the Google Map engine starts dropping shapes when there are too many to show. So I funneled different counties' data into almost a dozen different layers, a workaround the Google folks showed me ahead of time.

That said, I had time to code and tweak lots of neat things because I didn't have to focus on building or running a database engine. Google's free services took care of that.

What Could Be Better?

Probably a lot. I wanted to let people to add comments, right on the map, but didn't have the chops or time to pull that off.

Another good thing would be a "Loading ..." indicator displayed while the map data is pulled into your browser, which I may yet add.

But what couldn't have been better was everything I learned, the help I got from other data folks and the support from my WNYC colleagues. Plus we gave New Yorkers a pretty nifty service and several great stories.

Need more details? Feel free to ask questions in the comments. Or drop me a line. I'll try to help, too.

Where's the Next Bus? I'll Tell You

When New York City released real-time bus info for a Brooklyn line, one of my colleagues wasn't happy.

Yes, she could use a smartphone to see buses on an MTA map. Yes, she could get location information by texting the code for her stop (she lives near the route). But none of this was simple enough.

"I want a phone number that will TELL me when the next bus is coming," she said.

I'll have it for you by the end of the day, I replied.

It was a bit of a gamble. More of a challenge to myself, really. But if "before midnight" counts as the end of the day, then I succeeded. Try it:

Dial 646-480-7193. When prompted, enter 308333 or any of the bus stop codes for the B63 line.

It's not journalism, but it is a working example of how someone can take public data and turn it into a useful tool, quickly. And at almost no cost.

How I Did It

First, I wrote a little program in Sinatra that sends a 6-digit bus stop code to the NYC Metropolitan Transit Authority API -- or application programming interface -- whenever someone hits my program's web address. (The API is a public portal to the live bus data. All you need is a little programmer know-how and a free key from the MTA. The technical details are right here.)

The API sends back 77 lines of information about the stop and the buses approaching it, including the one detail I want:


The next bus is three stops away. I use a Ruby tool called Nokogiri to find and extract just this number, which I drop into an amazingly simple web page. The page's entire output looks like this:

The next bus to arrive at 14th Street and Fifth Avenue heading north is 3 stops away.

That accomplished, I bought a phone number from Twilio for $1 a month, and $0.01/call. Twilio provides a telephone connection to web-based programs, which I first heard about during a demo at a TimesOpen event

I set my new phone number to hit my program's URL whenever someone calls. By wrapping the text in special tags, Twilio recognizes it as a cue to talk:

The next bus to arrive at 14th Street and Fifth Avenue heading north is 3 stops away.


I later used Twilio's <Gather> tags to build what's essentially a web form to capture digits entered on a phone for any bus stop. I also added error catching, for when no buses are coming, and programmed it to announce the location of the following bus if the next bus is arriving.

Some Hiccups

Turns out that Twilio reads text a little too fast to be understood on an NYC street corner. So I rewrote the output to introduce pauses:

The next bus to arrive at ... 14th Street ... and ... Fifth Avenue ... heading ... north ... is ... 3 ... stops ... away.

Also, there's a bug in the API system that sends back server errors in certain conditions. Word is that the MTA has actually fixed this on their development servers, and that fix is being pushed to the public system pretty soon. 

More to Come

I have a few enhancements up my sleeve, which should be done in a week or two. If you'd like to know when new tricks roll out, drop a note with "Bus Talk" in the subject line to john (at) johnkeefe.net. I'd love to hear your thoughts on how it could work better, too.

I'll also release the source code after those nifty updates. Let me know, too, if you're interested in that.

Photo by jbrau13 on flickr.

Fast, Little Maps with Fusion Tables

Google Fusion Tables can handle huge amounts of data -- and seem designed for that. But a great little secret is that they're fantastic for making fast maps. Even little ones.

And it's surprisingly easy.

At WNYC, we used fusion tables for this quickie map of 63 taxi relief stands. My colleague Jim Colgan whipped together these plowed-streets maps (including the one below) from listeners' texted-in reports -- while he was sick in bed!

Some reasons we've been drawn to mapping with Google Fusion Tables:

Simple uploads. All you need is a comma-separated table (csv) or a spreadsheet made in Excel or Google Docs. Each "point" goes on a row. If you have even basic Excel skills, you're more than ready to go.

Embedded geocoding. Put addresses in one of your columns, and Google will geocode them for you -- doing the work of finding the latitude and longitude for your pin. If you already have the coordinates, that's fine, too. Here's the help page on this for more.

Customizable icons. You can designate one of your columns as the icon column, and use this map of available icons to pick names to put in that column for each point There are some really clear instructions for this

Custom popups. You can define what appears in a pin's pop-up bubble. Doing this is a little tricky, but just. In the "map" visualization, click on "Configure Info Window." I find the default templates confusing, so I choose "Custom" from the drop-down menu. You can then use text, html and the table info {in_curly_brackets} to craft a custom bubble.

Easy embeds. Zoom and position the map as you like it and then click the "Get Link" button for a link to what you see. Or click the blue "Get embeddable link" link to get the embed code. (Design note to Google folks: It's confusing that one of these is a button and one is an html link!)

Easy updates. You can add more data points easily, either with additional uploads or just typing your additions or fixes in your browser. 

Privacy controls. As with other Google products, you can click the "Share" button to control who can view and/or edit each table and map -- which is really nice for working in teams.

News maps on news time.  That's been working for us.

Update: Jim Colgan, who put together the snowplow map, talks about how he did it with the folks at Mobile Commons, who run the platform we use for texting projects.