This week, we published a map showing total NYPD stop and frisks by block together with locations where guns were discovered during such stops.
In the tradition of showing our work, here's some information about how we built it -- and data you can download and explore yourself.
The major bumps I hit working with the NYPD's Stop, Question and Frisk data sets were 1) they're in a format I don't know, and 2) the geographic locations aren't in latitudes and longitudes.
For bump #1, I used the free statistical program "R" to convert the NYPD's ".por" files into something I could use. R is also great at handling big data sets, and easily managed the 685,724 stops in the 2011 file.
For bump #2, I noticed that each stop had data fields called "XCOORD" and "YCOORD." A couple of tests confirmed that those values described the stop's position on the New York-Long Island State Plane Coordinate System -- something I've seen in a lot of city data. So I used the free geographic software QGIS to load in the data and convert (technically, reproject) those coordinates into latitudes and longitudes.
And now you can have the data I used to make the map. Just click to download:
(4.3MB download, unzips to 12MB)
Contains a shapefile of all NYC blocks with the total stop-and-frisks calculated for each block, a shapefile with the points for all stops where guns were found, raw data on each of the 768 stops where guns were found and notes about each data set. Here's more detail on the contents..
(51MB download, unzips to 500MB)
This file has of the above and a .csv with the raw data for all 685,724 stops in 2011. While it's in a more common format than what the NYPD provides, it's too big to open in Excel and maxes out the limits for Google Fusion Tables. So you'll need a stats program like R or some database know-how to handle it.
Besides providing wonderful control over styles and colors, TileMill solves an important problem: New York City has roughly 38,500 census blocks -- and loading the data to draw them all onto a Google map will anger any browser. With TileMill, you bake the data into individual image tiles, which get served up to the user as they zoom and pan.
To cover the area of NYC and provide 8 levels of zoom, I pre-cooked 59,095 tiles. But once they're uploaded to the MapBox server, which took about 15 minutes, they load almost instantly.
As always, I welcome comments and questions below or at john (at) johnkeefe.net.