Sharing NYC Police Precinct Data

Note: This post was originally published April 29, 2011, and updated in June 2020. In February 2022, I updated it again using 2020 Census data.

Anyone doing population analysis by NYC police precinct might find this post helpful, especially if you're interested in race and/or ethnicity analysis by precinct.

Back in 2011, I wanted to compare the racial and ethnic breakdown of low-level marijuana arrests — reported by police precinct — with that of the general population. The population data, of course, is available from the US Census, but it's not provided by police precincts, which also don't follow any major census boundaries like census tracts. Instead, they generally follow streets and shorelines. Fortunately, census blocks (which in New York, are often just city blocks) also follow streets and shorelines.

So I used US Census block maps and precinct maps from the city to figure out which blocks are in which precincts. Since population data is available at the block level, that data can then be aggregated into precincts.

In this, the third version of this post, I've updated the counts now that the 2020 population data is available.

The 2020 data

• nyc_precinct_2020pop.csv is the 2020 Census population, race, and ethnicity (Hispanic/non-Hispanic) data by NYPD police precinct. The column headers from the US Census are a little cryptic, but you can translate them using the P1 table metadata file and the P2 table metadata file.

• nyc_block_precinct_2020pop.csv — every populated block in NYC is identified by its ID (called "GEOID20"), is matched to the police precinct it sits within, and contains the block's race/ethnicity information. Use the same metadata tables to translate the column headers. Also be sure to read about the caveats below.

• nyc_precincts.geojson depicts the geographic boundaries of the NYPD precincts I used for the files above, as they existed in February 2022. As of this post, the information on the NYC Open Data portal indicates it was last updated on Nov 24, 2021.

Caveats for the 2020 data

The biggest caveat is that the US Census has introduced data fuzziness, or "noise," to make it difficult to identify individuals based on census data. This fuzziness is more pronounced at smaller geographies — the smallest being census blocks, which I've used for these calculations. Hansi Lo Wang did a great primer on these data protections for NPR, and the US Census Bureau has put out a lot of material on how it uses "differential privacy."

My Bus-to-Speech Experiment

One of my "show don't tell" projects was mentioned this month in a Harvard Business Review article by Tom Kelley and David Kelley of IDEO. For more about the experiment -- and to give it a try -- jump over to this post.

The Nevada Vote: In 3-D

The Guardian pushed the limits of election-night data display this week with a relief map of the Florida primary vote.

They didn't push far enough.

As promised: Live election results in True 3-D.

(To avoid blog lag, I've put the live version here.)

You need a current browser to see it. Recent versions of Chrome and Firefox work. Safari does, too, if you nudge it.

With any luck, the counties shall grow as the vote rolls in tonight.

For those interested, I built it in Processing and use Processing.js to put it on the web. You're welcome to embed it if you wish. Just drop me a note or comment that you did.

UPDATE: My data-fetching code is a little wonky. Refresh the page to ensure the latest results!

UPDATE 2: I actually don't believe this is the best way to present numeric data. Representing numeric scale with a 3D drawing on a 2D surface is exceptionally tricky and should probably be avoided. Also, there are no rollovers or other clarifying information -- like county names and vote counts.

That said, I like the idea that some data sets might be worth spinning, touching and flying through. So maybe this is my first step in that direction.

Plus, it was fun.

UPDATE 3: By request, here is the Processing sketch upon which this was built.

Steal This Map (or Make One Yourself)

For the second time -- the first being Iowa -- WNYC's It's A Free Country has a map that uses free, live election data.

And that means you can have it, too.

Just use the embed link at the bottom of the map.

And if you want to make your own, check out some tips in my next post.

Free, Live Election data: Now's your chance to play

UPDATED in two key spots below.

Election geeks, you are in luck. For the second time, Google plans to offer free, real-time election results, allowing anyone to tinker and play with hard-to-get voting numbers.

It's for the Nevada Republican caucuses this Saturday, February 4, and even if you have no connection to Nevada, it's a chance to experiment with live results like the Big Guys. Make a map. Mash up some data. Have fun.

The first time Google did this, we made this Iowa caucuses results map at WNYC, mashing up Patchwork Nation community types with the live vote tally. And since we've been through it once, I've got some tips and tricks for making your own project.

My only request: Send me a link to whatever you make. I'd love to see it.

Setting the Fusion Table

Updated: The Google folks are providing live tallies from the Nevada GOP in two Fusion Tables -- one by county and one by precinct -- which will be updated with new data throughout the evening.

This means means you get all of the functionality of those tables, including simple charts and cool maps. Check out these posts to get started with Fusion Tables, if you're not already familiar.

Urge to Merge

My favorite part of Fusion Tables is that you can easily merge (or join, in SQL-speak) two separate tables of data. In this case, you'll be able to merge any data organized by Nevada's 17 counties (one's actually an independent city). Unemployment figures, Social Security recipients and any U.S. Census designation you can think of are just a few of the possibilities.

Updated 11:39 a.m. 2/2/2012: This section originally talked about merging on the county's unique FIPS code -- which turns out to be tricky, since the results table doesn't have those codes. But if your data has the Nevada county names, you can merge using the name as the key (provided they are identical lists in both tables). Or you can add the county names to your data by adding a column and entering them by hand.

For reference, or to map the shapes of the Nevada counties, you can use this table I built merging data from the U.S. Census (which calls the FIPS codes "GEOID10") and the live election data.

No matter how you do it, once merged, you'll end up with a larger table containing all of your mashup data -- unemployment, number of children, etc. -- lined up next to the live vote data. Even though it's a new table, it'll update in real time with the underlying vote table.

Welcome, Json

If you're a JavaScripter, it is super easy to get the data you want from Google's results table, or a merged table you built with it.

First, construct a query url according to the Google Fusion Tables documentation. This can be a little tricky, but with some tinkering you can make it work. Be sure to encode commas, greater-than signs and other symbols. Here's a nifty URL encoder if you need to convert all or part of the URL. Also, surround with single-quotes any column name containing dashes, such as 'VoteCount-Paul'.

For a simple example, take a peek at this "shoes" table. Then try this URL:

https://www.google.com/fusiontables/api/query?
sql=SELECT+Product%2C+Inventory+FROM+274409&jsonCallback=foo

A little decryption here: The + signs are spaces, and the %2C codes are commas. The table number we're looking at is 274409. So the syntax is "SELECT Product, Inventory FROM 274409." Append &jsonCallback=foo and you get back JSON. If you're using a jQuery AJAX call, as you'll see below, make it &jsonCallback=?

You should get a text file that looks like this:

foo({"table":{"cols":["Product","Inventory"],"rows":[["Amber Bead",1.251500558E9],["Black Shoes",356],["White Shoes",100]]}})

Voila! JSON.

To get the statewide total for Iowa, I made a crazylong URL that requests sums of the columns I wanted.

Pro-tip: If you try sorting the data within Fusion Tables using Options->Filter or Options->Aggregate the "query" you're using appears above the results. Use that to help form the URL after the query?sql= part.

Inside the JavaScript map application, I used jQuery's $.getJSON() function to hit that URL and load in the data, and setTimeout() to do it every two minutes. You can see and use the code here.

Try and Learn

If you've ever dreamed of making your own election-night results map, or just like the thrill of a new challenge, don't let this opportunity pass you by. It's lucky that we get a chance to play with free, live and well-structured voting information. And no matter what you learn in the process, I bet it'll be valuable down the road.

Maybe even in November.

As always, don't hesitate to contact me -- or post a comment -- with questions, clarifications and ideas. And if you're inspired to make something for Nevada's primary, definitely drop me a note and a link!

[ Map detail: Patchwork Nation - Votes for Barak Obama in 2008, by county ]

Making AP Election Data Easy with Fusion Tables

This post is for journalists who use (or would like to use) election data from the Associated Press -- which is a paid service the AP provides. If that describes you, read on!

When Google gave away free, live election data for the Iowa Caucuses, something struck me right away: It was easy.

Data provided by the Associated Press, which drives almost every election site you've ever seen, is notoriously tricky to manage -- a statement I'm confident making based on talks with many election-night veterans and on my own experience.

But Google's results were posted in a public Google Fusion Table, which is basically a spreadsheet on steroids. That meant I could get the data I wanted simply by constructing the correct URL. Votes by county, sorted by county? No problem. Candidate totals for the entire state? Sure. Votes mashed with other data I had? Yup. Formatted in JSON? Bring it.

Instantly. Easily.

(Here are the URLs I used above, and here's the documentation from Google on how to construct them. Hard-to-find tip: Append &jsonCallback=anything to get the json. And if you're using jQuery AJAX calls, make it &jsonCallback=?)

A week later, for the New Hampshire Primary, there were no free Google data. So I made an AP data-fetcher-and-wrangler based on code by Al Shaw. Through no fault of Al's code, my adaptation was slow, complicated and crashed every couple of hours. It worked, but just barely.

Next up was South Carolina, and I was determined to make AP's data friendlier by putting in a Google Fusion Table.

And it worked.

How I did it

In the interest of time and clarity -- and to spark discussion before the primaries are over (or irrelevant) -- I'm leaving out a bunch of the nitpicky details. If you're an AP Elections subscriber and want to try this, contact me at john (at) johnkeefe.net. I'll help you any way I can.

AP provides data in several formats, including a "flat file," which basically is a huge, semicolon-delimited spreadsheet. Each row represents a county, and each column the latest stats for that county, such as precincts reporting and each candidate's total votes.

The flat file doesn't have column headers, though. So I first uploaded AP's South Carolina test table to Google Spreadsheets and added the column names I needed.

I then imported the spreadsheet into a non-public Google Fusion Table.

For election night, I set up a script on my computer that does the following steps every two minutes:

1. Logs into AP's servers via FTP and downloads the flat file.

2. Deletes the data from the Google Fusion Table I made earlier and uploads the entire flat file anew. This is accomplished with a little Python program written by the brilliant (and patient) Kathryn Hurley, of the Google Fusion Tables team. I've posted it here with her permission. I don't know Python, but didn't need to. I just needed to make sure the list of columns in the data_import.py exactly matched the columns in my table. So I cut-and-pasted them from the Google spreadsheet. The script executes the command:

python data_import.py [google account username] [flat file filepath] [fusion table id]

3. Next, it hits the Fusion Table with a simple URL request formatted to return the data I want as JSON. This is the URL I used for getting the county totals.

4. Then it sends that JSON as a file, via FTP, to a subdirectory of my map application on WNYC's servers.

Once a minute, the election map running in the user's browser looks at that data file to get the latest info.

In this way, I completely avoided the need to build and maintain a database. I know there are great database folks out there, but I'm not one of them. The Fusion Table became my database.

Technically, I could skip steps 3 and 4 by simply pointing my map application at the Fusion Table to get the data it needs. That's what I did for Iowa, using the free Google data. But the table would be publicly visisble on Google's servers ... and my reading of the AP contract, understandably, doesn't allow that.

I strongly believe that the easier AP's data is to use, the more budding journocoders will make new election-night interactives. And if we can work together to do that, let's. For me, this method was a lot easier than anything else I had tried before.

A final note: If you're a Python-savvy programmer, be sure to check out what the LA Times has shared to make life easier, too. It's pretty slick.

Once Upon a Datum: Mapmaking on News Time

In September, I shared how WNYC makes news maps during a talk at the the Online News Association conference.

UPDATE: ONA posted a video of this presentation, which I've embedded here:

'Once Upon a Datum': Telling Visual Stories from Online News Association on Vimeo.

Here are my presentation slides (PDF), and here's a list of links to pages and sites I discussed in my talk:

Same-Sex Couples in NYC: http://www.wnyc.org/articles/wnyc-news/2011/jul/14/census-shows-rising-number-gay-couples-and-dominicans/

Hispanic Origins in NYC: http://www.wnyc.org/articles/wnyc-news/2011/jul/14/census-shows-rising-number-gay-couples-and-dominicans/

The New Littles: http://www.wnyc.org/shows/bl/clusters/2011/jun/02/june-guest-andrew-beveridge-and-new-littles/

Marijuana Arrests: http://www.wnyc.org/articles/wnyc-news/2011/apr/27/alleged-illegal-searches/

Contributions by Zip Code: http://empire.wnyc.org/2011/07/where-are-the-mayoral-candidates-raising-their-money/

Dollars in a District: http://empire.wnyc.org/2011/09/the-54th-assembly-campaign-contribution-breakdown-where-have-all-the-in-district-donors-gone/

NYC Hurricane Evacuation Map: http://wny.cc/EvacZones

NYC DataMine: http://www.nyc.gov/html/datamine/html/data/geographic.shtml

Shpescape: http://www.shpescape.com/

Hurricane Zones Fusion Table: http://www.google.com/fusiontables/DataSource?dsrcid=964884

Fusion Tables Layer Builder: http://j.mp/FusionBuilder or http://gmaps-samples.googlecode.com/svn/trunk/fusiontables/fusiontableslayer_builder.html

Layer-wizard map from presentation: http://dl.dropbox.com/u/466610/preso-map.html

Mapping Dollars in a District

I loved this challenge.

WNYC's Colby Hamilton wanted to know: How much money was being raised by candidates for a state legislative district from within the district itself?

Answer: Very little.

Making this Map

This wasn't my typical "just upload it to Fusion Tables" project. It got geeky quickly, intentionally.

My method involved a PostgreSQL / PostGIS database and QGIS mapping software. Everything is free, which is amazing, yet they take some advanced tinkering -- especially the database stuff.

First, I geocoded the donation addresses, getting each one's latitude and longitude, using this nifty batch geocoder. The donor's name and donation amount were also on each line.

Then I fed the data into my PostgreSQL database and pulled it into QGIS (they talk nicely together). I also layered in a shapefile of the district from the US Census Bureau.

I then asked QGIS where the donations and the district "intersect" -- and spit out the resulting shapefile for each candidate.

Next I uploaded each candidates' "intersection" shapefile and their all-donations shapefile to Google Fusion Tables using shpescape. Once there, I used Fusion Tables' aggregation feature to total the donations in the district (the intersection).

Fusion Tables also allowed me me plot all of the donations, and also the shape of the district. (Little trick: I actually copied the "geography" cell from the 54th District table and added it as a new row to the donations table. That way the donations and the district shape appear at the same time.)

Finally, I put the layers together into a map template I've grown since building 2010 Census maps.

You'll notice I'm not diving deep into the details here, but if you're looking into a similar project, drop me a note at john at johnkeefe dot net look at this page, where I share every tidbit, command and SQL "select" statement I used.

Coulda Just Used Fusion Tables

The truth is, I could have used only Fusion Tables. The number of donations within the district turned out to be so small -- 69 in total -- I could have simply uploaded the donations into Fusion Tables, letting it do the geocoding and the drawing of points and the district shape.

Then it's just a matter of clicking on every dot within the pink lines, adding up the donations in each bubble along the way.

Instead, I've created a process to do more complicated inside-an-area calculations. And to help others do them, too.

Making the WNYC Census Map

When the New York census numbers arrived this week, we were ready. WNYC quickly published an interactive, sharable map so New Yorkers -- and our reporters -- could explore the new data and see the stories.

We built the map with free tools and timely help from some smart, kind people.

&amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;lt;p&amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;gt;scrolling="no"&amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;gt; &amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;lt;p&amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;gt;scrolling="no"&amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;gt;

The short story is that we mashed together population numbers and geographic shapes using Google Fusion Tables, and then used JavaScript and Fusion Tables' mapping features to make things pretty and interactive.

The long story is meandering and full of wrong turns. But here are the highlights, should anyone need a little navigation. Don't hesitate to contact me for more help and insight; I'm due to pay some forward.

Getting in Shape

First up: Shapes of the census tracts plotted on Earth. I downloaded New York's tracts from the U.S. Census Bureau's TIGER/Line Shapefile page. They also have counties, blocks, zip codes, and more.

Then I uploaded this "shapefile" -- actually collection of related files zipped together -- to Fusion Tables with a free, online tool called Shpescape. (Thanks to Google's Rebecca Shapley for sharing this key to my puzzle.)

Hello, Data

Census data is publicly available, but can be a hassle to handle. In fact, on the day each state's info was released, the files were available in a set that apparently requires one of two pricey programs -- SAS or Microsoft Access -- to assemble.

So I got clean, assembled, comma-delimited files -- complete with 2000-to-2010 comparisons -- from the USA Today census team, which provided them as a courtesy to members of Investigative Reporters and Editors. Huge props to Anthony DeBarros and Paul Overberg, who crunched the New York numbers in a blazing 30 minutes.

By the way, IRE membership is $60 for professionals and $25 for students. Well worth it, and cheaper than either of those programs. If you're digging into census numbers and qualify, I recommend this route.

That said, every state's 2010 data is now available free from the Census Bureau's American Fact Finder. Navigating the site is a little tricky, and worth a separate post, but the bureau provides some tutorials, and there's very detailed PDF about each data field.

With data in hand, I uploaded it to Google Fusion Tables in another table.

Map Making

Next, I merged the shapes table and population table, using the unique tract ID to marry the data (the shape file calls it GEOID10. the IRE data calls it FULLTRACT). Note that the GEOID10 is formatted differently depending on whether you're using tracts, blocks, counties, etc., so be sure you've got the right match in both files.

Clicking Visualize -> Map shows a map. It'll be all default-red until you click on Configure Styles -> Fill Colors -> Gradient (or ->Buckets) and make different colors appear depending on values in the column of your choice.

Using the Share button makes the map viewable by others, and "Get embeddable link" does just that.

Adding Prettiness

I used the Fusion Tables "Configure info window" option to make custom pop-up bubbles on our maps. This actually required some nicer-looking data, such as a columns with rounded percentages and + or – signs. I added these using the free R statistical program, which I learned how to do from The New York Times' Amanda Cox at the 2011 Computer Assisted Reporting conference.)

Census tracts officially extend to the state lines, which made it look like a lot of people live in the Hudson River. So I had trimmed those tracts to the shorelines with a free mapping program called QGIS, using water shapefiles as a reference (those are here, in the drop-down menu).

After creating 12 merged Fusion Tables, I pulled them into one page using JavaScript and jQuery, with fantastic guidance from Joe Germuska at the Chicago Tribune (part of the team that built this great map).

The "Share/Embed this view" feature came together in two parts: 1) The JavaScript turns the current the latitude, longitude, zoom level and current map choice into a long URL that pops up when you click the Share/Embed link. 2) Using a nifty jQuery plug-in (updated link Dec. 2011), the map looks for those values in the URL that summoned it, and reorients to that map if they exist.

Prep Work

Clearly, not all of this could happen in a couple of hours on Data Day. I'd been tinkering, testing and tweaking for a few weeks using New Jersey's data, which came out much earlier.

I also wrote down, edited and revised every step I took to make the maps. So when the adrenaline was running I had a script to follow.

The WNYC Web Team also set up a slick, fresh project server, at project.wnyc.org, to host the html pages and track the traffic.

Fusion Function

Using Google Fusion Tables made it super easy to manage, map and serve up a lot of data. And the FT feedback team was fantastic about responding to questions and glitches I encountered along the way.

I did run into a couple of hiccups: slow load times and pop-up bubbles that failed to pop up. The first was a product of displaying so much data -- and I knew I was pushing things. The second was a Google glitch that their engineers managed to fix within a few hours, but was still spotty at times afterward.

Also, the Google Map engine starts dropping shapes when there are too many to show. So I funneled different counties' data into almost a dozen different layers, a workaround the Google folks showed me ahead of time.

That said, I had time to code and tweak lots of neat things because I didn't have to focus on building or running a database engine. Google's free services took care of that.

What Could Be Better?

Probably a lot. I wanted to let people to add comments, right on the map, but didn't have the chops or time to pull that off.

Another good thing would be a "Loading ..." indicator displayed while the map data is pulled into your browser, which I may yet add.

But what couldn't have been better was everything I learned, the help I got from other data folks and the support from my WNYC colleagues. Plus we gave New Yorkers a pretty nifty service and several great stories.

Need more details? Feel free to ask questions in the comments. Or drop me a line. I'll try to help, too.

Open All Night: The Great Urban Hack NYC

For 26 hours this weekend, a bunch of journalists and coders got together to make lots of great things designed to help the citizens of New York City.

My blog post summarizing the event and all of the resulting projects is upon Hacks/Hackers.

I helped Jenny 8. Lee, Chrys Wu and Stephanie Pereira organize the event. Then I joined a team working with digital heaps of NYC taxi trip data to make data visualizations and start some other projects. My favorite one is here (with a detail below), which is a representation of taxi usage for 24 hours, set around a clock. Beautiful. It was built by Zoe Fraade-Blanar using Processing and data crunched by the other teammates.

Click image for full view

I came away from the event with many new connections, excitement about learning Processing, some more skills in Sinatra and a note to check out Bees with Machine Guns(!)

johnkeefe.net

journalism, data & diy hackery

Tag mashups