Extracting Geospatial Data from Historical Maps

Mark Altaweel


There have been major advancements in recent years in automated approaches that read data using scanned images. Such approaches help to automatically index text so that they can be searchable and machine readable. However, when it comes to map data, there has been only limited work until now.

New approaches can not only detect names of locations and make them searchable but this gives use the ability to now recover data largely missing from modern maps.

Linking Map Labels with Geographic Data

One new project that has contributed to making old maps provide indexable and readable text and visual data is the makKurator project. This project has been using deep learning metric methods which associate map labels with given spaces.

One challenge for automated approaches in reading maps is to determine location labels, including those with more than one word names, and associate the given text with a specific location. By determining an appropriate distance metric between a given place and its label, including distance between terms so that it is clear that the terms are related, text could be classified to an appropriate location indicated.

Free weekly newsletter

Fill out your e-mail address to receive our newsletter!

A 16th century map of the Scandinavian peninsula with ships and sea creatures in the water.
Historical maps contain a trove of geographic information in text format. Map of the Scandinavian peninsula by Olaus Magnus, 2nd edition, 1572. Via Library of Congress.

In this new approach, a visual predictor segments a given area and makes the linkage between the text item and bounding area of a location. The key output includes a probability map that classifies given map pixels with text labels so that the linkages can be made with a degree of confidence.

A key benefit of the new approach is it makes map features and data, including those indicated by text, searchable within maps and between maps so that linkages between data can be made. For instance, if one wanted to search altitude over 1000 feet on old maps, then this becomes possible using only the image as input data where the elevation would be automatically extracted from the visual data and labels.

This effort has recently made it possible to create a collaboration between researchers and the Rumsey Map Collection, which will enable georeferenced maps to be searchable by text. This has made it the largest map repository that can be searched using text-based search. This effectively can make old maps as usable as OpenStreetMap in searching and identifying places of interest.[1]

Using Deep Learning Methods to Extract Geo-Metadata from Maps

Recent approaches have also enabled deep learning methods, such as the use of convolutional neural networks (CNNs) with data augmentation, to better classify and know geographies simply by using existing geographic shape locations.

For instance, providing shapes of countries or continents from maps enables these geographic features to be classified based on their geographic names. This helps to limit the number of training data requirements, making it easier for researchers to create approaches that can created automated location detection on maps, including older maps that have only been scanned.[2] 

Extracting Geographic Data from Historical Maps

In recent years, scholars have also made progress in various feature identification and extraction using historical maps. For instance, knowing where old roads are can be important in reconstructing places and historical data about a region.

Using old maps, researchers have been able to extract old roads and trails using fully convolutional networks pre-trained with annotated road data, used for training data, from old maps. The models could then identify different types of roads from maps of different periods.[3] 

A figure showing two columns of maps, two columns of ground-truth geo data extraction, and two columns of predicted geo data from those historical maps.
Using fully convolutional networks pre-trained with annotated road data, researchers were able to retrieve historical roads and trails from historical maps. From left to right in this figure: input patch, ground truth mask, and produced road type prediction by the deep neural network model. Figure: Ekim, Sertel, & Kabadayı, 2021, CC BY 4.0.

Other work has also enabled automated identification and classification of given symbols on old maps so that they can be digitized and used for more modern mapping. Similar to the road identification and map augmentation work, deep learning using variations of CNNs has enabled symbols to be more easily identified so that they can be classified for use.[4]

Deep Learning to Automate the Extraction of Geo Data from Historical Maps

Increasingly, we are seeing scholars finding new ways to automate classification of key map features while also utilizing text data within maps so that the text and other information can be indexable and searchable. This now makes old maps a more powerful resource as they become as effective as modern digital maps where similar approaches have been used.

By utilizing deep learning, researchers can extract a lot of new data from historical maps, including text and visual data, connecting our modern world with landscapes and regions that have changed or have information now missing from our modern maps. 


[1]    For more on the new makKurator project and approach,including information on the code used, see:  Li, Zekun, Yao-Yi Chiang, Sasan Tavakkol, Basel Shbita, Johannes H. Uhl, Stefan Leyk, and Craig A. Knoblock. 2020. ‘An Automatic Approach for Generating Rich, Linked Geo-Metadata from Historical Map Images’. In Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, 3290–98. Virtual Event CA USA: ACM. https://doi.org/10.1145/3394486.3403381.

[2]    For more on data augmentation methods used to enhance deep learning methods to improve location classification on maps, see:  Hu, Yingjie, Zhipeng Gui, Jimin Wang, and Muxian Li. 2021. ‘Enriching the Metadata of Map Images: A Deep Learning Approach with GIS-Based Data Augmentation’. International Journal of Geographical Information Science, August, 1–23. https://doi.org/10.1080/13658816.2021.1968407.

[3]    For more on the automated road detection approach from old maps, see:  Ekim, Burak, Elif Sertel, and M. Erdem Kabadayı. 2021. ‘Automatic Road Extraction from Historical Maps Using Deep Learning Techniques: A Regional Case Study of Turkey in a German World War II Map’. ISPRS International Journal of Geo-Information 10 (8): 492. https://doi.org/10.3390/ijgi10080492.

[4]    For more on classifying symbols from old maps, see:  Groom, G., Levin, G., Svenningsen, S. R., & Perner, M. L. (2020). Historical Maps: Machine learning helps us over the map vectorisation crux. In Proceedings of the ICA Workshop on Automatic Vectorisation of Historical Maps (pp. 89-98). 


Photo of author
About the author
Mark Altaweel
Mark Altaweel is a Reader in Near Eastern Archaeology at the Institute of Archaeology, University College London, having held previous appointments and joint appointments at the University of Chicago, University of Alaska, and Argonne National Laboratory. Mark has an undergraduate degree in Anthropology and Masters and PhD degrees from the University of Chicago’s Department of Near Eastern Languages and Civilizations.