Automating Extracting GIS Data from Scanned Maps

Caitlin Dempsey


The New York Public Library Labs (NYPL Labs) has posted on Github the code to its open source map-vectorizer project.  NYPL Lab’s map-vectorizer project is seeking to automate (“like OCR for maps”) the process of extracting polygon and attribute information from old scanned maps.  

Extracting Building Information from Historical Maps

The code was developed with the purpose of extracting building information from New York City insurance atlases published in the 19th and early 20th centuries of which the NYPL has hundreds of containing thousands of map sheets.

As the NYPL Lab explains on the read me page for the project, the process has saved thousands of hours in creating GIS data from old scanned maps:  “[I]t took NYPL staff coordinating a small army of volunteers three years to produce 170,000 polygons with attributes (from just four of hundreds of atlases at NYPL).  It now takes a period of time closer to 24 hours to generate a comparable number of polygons with some basic metadata.

Map Vectorizer Project

Currently, the map-vectorizer project can extract polygon shapes and color attribute information from scanned maps.  Future planned enhancements include extracting dot presence, dot count, and dot type (full vs outline).

Free weekly newsletter

Fill out your e-mail address to receive our newsletter!

Sample polygon extraction results from a section of an insurance map.
Sample polygon extraction results from a section of an insurance map.

To use the project code, the following dependencies need to be already installed on your machine: Python with OpenCVImageMagickRGIMP and GDAL Tools (full details available on the Github project page.

Building Inspector


You can help QA/QC the polygon extraction results from the project through the NYPL Lab’s Building Inspector program.  Users are run through a short tutorial on how to look at outlines of computer generated building outlines to determine if the outline matches that of an actual building.  

Users can select from three options: no for not a building outline, yes for when the outline matches a building, and fix for when the outline is over a building but needs correction to match the true outline.  

The program will help improve information extracted from 19th century New York City insurance atlases.


Photo of author
About the author
Caitlin Dempsey
Caitlin Dempsey is the editor of Geography Realm and holds a master's degree in Geography from UCLA as well as a Master of Library and Information Science (MLIS) from SJSU.