Note: This post was originally published April 29, 2011, and updated in June 2020. In February 2022, I updated it again using 2020 Census data.
Anyone doing population analysis by NYC police precinct might find this post helpful, especially if you're interested in race and/or ethnicity analysis by precinct.
Back in 2011, I wanted to compare the racial and ethnic breakdown of low-level marijuana arrests — reported by police precinct — with that of the general population. The population data, of course, is available from the US Census, but it's not provided by police precincts, which also don't follow any major census boundaries like census tracts. Instead, they generally follow streets and shorelines. Fortunately, census blocks (which in New York, are often just city blocks) also follow streets and shorelines.
So I used US Census block maps and precinct maps from the city to figure out which blocks are in which precincts. Since population data is available at the block level, that data can then be aggregated into precincts.
In this, the third version of this post, I've updated the counts now that the 2020 population data is available.
The 2020 data
• nyc_precinct_2020pop.csv is the 2020 Census population, race, and ethnicity (Hispanic/non-Hispanic) data by NYPD police precinct. The column headers from the US Census are a little cryptic, but you can translate them using the P1 table metadata file and the P2 table metadata file.
• nyc_block_precinct_2020pop.csv — every populated block in NYC is identified by its ID (called "GEOID20"), is matched to the police precinct it sits within, and contains the block's race/ethnicity information. Use the same metadata tables to translate the column headers. Also be sure to read about the caveats below.
• nyc_precincts.geojson depicts the geographic boundaries of the NYPD precincts I used for the files above, as they existed in February 2022. As of this post, the information on the NYC Open Data portal indicates it was last updated on Nov 24, 2021.
Caveats for the 2020 data
The biggest caveat is that the US Census has introduced data fuzziness, or "noise," to make it difficult to identify individuals based on census data. This fuzziness is more pronounced at smaller geographies — the smallest being census blocks, which I've used for these calculations. Hansi Lo Wang did a great primer on these data protections for NPR, and the US Census Bureau has put out a lot of material on how it uses "differential privacy."
I have not determined if and how this fuzziness affects the calculations I've done by police precinct. My general understanding is that, since I'm aggregating many blocks — almost 40,000 of them — into 77 police precincts, the error will wash away or be insignificant. But I don't know that for certain (if you have more insight, please let me know!) The data for an individual block, however, I do think should be treated with caution.
Separately, there are two apparently inhabited Bronx blocks that straddle two police precincts — and I've placed them entirely in one or the other. Block number 360050334001003 is said to contain 50 residents, though it is comprised entirely of parkland called the Bronx River Forest. It straddles the 47th and 49th precincts, and I've placed it in the 47th. Block 360050096002000 appears to be entirely commercial, but is said to have 33 residents. It straddles the 45th and 43rd precincts, and I've placed it in the 45th.
On issuing the command
make all, the Makefile downloads the census block geographic files from the US Census Bureau and the precincts from the New York City data portal. It then uses the census' block centroids to place each block within a police precinct and generates files used in the notebook. If you try it on your own, be sure to change the "mypath" variable in the Makefile and the notebook to the full path of a directory where you want to store the files.
I also generate a couple of "confidence check" files that allow me to use the QGIS open-source mapping program to visually check at how the calculations line up. I color the blocks by precinct number and then layer the precinct boundaries on top to see where any of the colors cross (that is the map atop this post).
There were, in fact, 43 blocks that didn't fall into precincts — largely because their centroids fell outside the precinct lines. Only 9 of those were actually populated, and I manually updated the precincts for those blocks.
The population data is available from the census data site, which takes some navigating.
Once I had the blocks connected to precincts, and the population data attached to the blocks, a basic pivot table gave me the final results.
The 2010 data, archived
For anyone needing the 2010 files, I'm archiving the materials from my original posts here. This is, to be clear, based on the 2010 census, and is not the latest data available.
The original stories I did this for, and the Google Fusion Tables where the data lived, are all gone to digital internet history. But I've recreated them here. Some precinct boundaries changed slightly from 2011, and those on Staten Island changed significantly with the addition of a fourth precinct on the island in 2013.
• 2010pop_2020precincts.csv is the 2010 population breakdown within each precinct as they are drawn in June 2020. The column headings are cryptic, but follow the codes starting on this page, which is from this rather large Census Bureau PDF.
• precinct_block_key_2020.csv is the Rosetta Stone for this project. It has two columns: each block's identifier, which the census calls "geoid10," and the precinct in which that block sits. Note that some blocks aren't in any precinct, usually because they're actually in the water.
• nyc_2010censusblocks_2020policeprecincts.csv contains base-level 2010 Census data for each block, married to the precinct for that block. For descriptions of the population columns, follow the codes starting on this page or see pages 6-21 in the Census Bureau PDF.
Caveats for the 2010 data
I did my best to be accurate in computing the intersection of blocks and precincts, even generating precinct maps and inspecting them visually. But errors may exist. You can check my math in the Jupyter notebooks I used.
Census blocks generally fall nicely within precinct outlines, but they don't always. In particular, three blocks significantly straddle two precincts. If you're doing very precise analysis, you'll want to account for them:
• Block 360470071002003: An area near the north end of the Gowanus Canal in Brooklyn. About half is in Precinct 76 and half in Precinct78. Total people: 51
• Block 360050096002000: Mainly industrial. Half in Precinct 76, half in Precinct 78. Total people: 5.
• Block 360610265003001: This block consists of five similar-sized apartment buildings near the George Washington Bridge. The northern set of buildings are in the 34th Precinct, with part of one building in the 33rd. I put the entire block, and the 687 people living there, in the 34th Precinct. Looks like roughly an 80/20 split.
I originally did this work while at WNYC, using PostgreSQL, PostGIS and QGIS. I was helped by the generosity and insights of Jeff Larson, Al Shaw, and Jonathan Soma. The 2020 data is assembled mainly using the great Mapshaper tool by Matthew Bloch.
If you find this information useful, find a bug, or know how the new census data-fuzzing might affect these calculations flag me on Twitter @jkeefe. I'd love to hear from you.