Python and Geospatial Analysis

Mark Altaweel

Updated:

There is no doubt that Python has become the main computer language that geospatial analysts and researchers use in their work in GIS and spatial analysis more broadly.

For those interested in knowing more, important questions may arise, such as why has this become the case and what are the recent trends?

Two podcasts help address this, including one on Geospatial and Python use and one on Jupyter Notebooks. Broader trends and other works also help to show this.

The Adoption of Python in GIS

Anita Graser highlights in her podcast episode the tremendous growth that GIS, geospatial analysis, and python have experienced together over the last decade and more. Initially, this marriage between a computer language and geospatial platforms occurred when major GIS platforms such as ArcGIS and QGIS  began to adopt Python as the main scripting, toolmaking, and analytical language.[1] 

The emergence of PostGIS and its focus on data handling of geospatial objects, while being deployed in a number of GIS environments such as QGIS, ArcGIS, and OpenStreetMap, has helped. This is also the case with less used platforms such as GRASS.



Free weekly newsletter

Fill out your e-mail address to receive our newsletter!
Email:  

By entering your email address you agree to receive our newsletter and agree with our privacy policy.
You may unsubscribe at any time.



Perhaps for users the main reason for the adoption of Python has been because of the fact that Python is easy to learn, good at data manipulation, and has many useful libraries that are apt or could be easily adapted for geospatial analysis.

Graser highlighted Pandas and her own work with GeoPandas.[2]

Pandas makes data manipulation, analysis, and data handling far easier than some other languages, while GeoPandas specifically focuses on making the benefits of Pandas available in a geospatial format using common spatial objects and adding capabilities in interactive plotting and performance.

The fact that many Python libraries are available and the list is growing helps users to have many options to leverage existing code and build more powerful features in their tools.

Platforms such as QGIS allow users to input their own extensions that are built in Python, further encouraging development and use of Python among GIS specialists.

This growth highlights that as GIS users and geospatial analysts develop their skills, Python might be the best language to focus on.

Relative to other, high level languages, Python is easier to use, being flexible with coding style and can be applied within different paradigms, including imperative, functional, procedural, and object-oriented approaches.[3] 

Popular platforms have also helped to make it easier to code functions by adding model builders, which are extensions that help with basic programming and organization that links data and functionality created by users. 

Using GeoPandas in QGIS for trajectory data handling.  Image: Anita Graser.
Using GeoPandas in QGIS for trajectory data handling. Image: Anita Graser.

There are, of course, problems and obstacles that users of Python have found to be a hindrance. This includes common compatibility issues, when libraries installed may not work together well or different versions could cause exceptions in the code to arise. Well written instructions and installation files can help address this but not all libraries have this.

There are tools to make library installation easier, such as  Conda. A graphical interface of Conda is Anaconda. Users also have access to Python development environments such as PyCharm and  Spyder, among many others.

Python has also branched out to incorporate the strengths of other languages by creating libraries that allow direct or comparable use of other languages. For performance, the C language has long been one of the best to use, with the Cython providing C/C++-like performance enhancement to Python, with Cython commonly used to help on issues such as speed and scaling of data analysis.

While other languages such as Scala and Java could be worth learning, for example on large-scale data manipulation of geospatial data, increasingly we are seeing Python deployed to big data problems thanks to parallel computing libraries and more tools tanking advantage of graphics processing unit (GPU) architecture.

Using Jupyter Notebooks in GIS

One criticism applied to code-based research has been the difficulty in replicating results and documenting findings.

One set of tools, which can be applied to Python but also many other computer languages, is the Jupyter family of tools, including Jupyter Notebooks, highlighted by Julia Wagemann in her podcast episode. 

Jupyter tools help with executing, documenting, and displaying how code works. Jupyter Notebooks is perhaps among the best known in this family of tools. This tool allows cells or blocks of code to be written that can directly integrate data and code in small segments that also show the output in the notebook.

This allows users to see how given code works, acts as a type of documentation or aid to documentation, and aids in the learning of what the given code is doing.

Jupyter Notebooks have been compared or likened to Google Docs for code, where collaborative work and sharing of how given parts work and are displayed can be accomplished.

Another great benefit is a notebook could allow you to go between different computer languages. For instance, many geospatial projects use Python for geospatial functions, but then apply R, another popular analytical language, for visual display or statistical analysis.

Having a Jupyter Notebook allows you to show different parts of the code for each language used, while also allowing the linkages to be displayed to allow a workflow to be developed between the two that can be replicated. This is possible based on different kernels used for each notebook.

For geospatial purposes, Jupyter Notebooks make it easier to show visual output and replicate it between teams, while making access to data easier through integrated data links, including big data.

Previously, users had to download possibly large data files which made replication difficult or cumbersome. For scientists, this is of great importance since it means research can verify and build more easily from existing work. We can think of a Jupyter Notebook as something that provides documentation, debugging, and execution in one environment, which also makes it useful for learning to code.

As we see the rise of Python, for instance, in geospatial analysis, people who may not be adept at coding but want to learn Python could use Jupyter Notebooks to learn parts of code in a simple and easy to use manner. JupyterHub is an extension that helps to collaborate or service multiple users using Jupyter Notebooks. It helps to have the needed libraries installed and allows collaborates to see what the other is developing, allowing editing and input from the users.

Another tool in the Jupyter family is JupyterLab that allows web-based interface for collaboration that also allows for different data formats. One can link to the other Jupyter tools used for development while sharing and accessing Jupyter Notebooks. The Voilà tool, part of the Jupyter family of tools, can be used to help develop web applications with JupyterLab.[4] 

Ultimately, the threshold to learning and developing Python tools for spatial analysis has become easier, which means we may see that Python continues for some time as the dominant language for geospatial applications.

The split map function is a part of the ipyleaflet package, an interactive maps visualization system for Jupyter.
The split map function is a part of the ipyleaflet package, an interactive maps visualization system for Jupyter. Image: Jupyter Blog.

Python has become the dominant language for geospatial analysis because it became adopted by major GIS platforms but increasingly users also saw its potential for data analysis and its relatively easy to understand syntax has helped to increase user numbers. Many libraries now exist that help users to create complex applications with sometimes minimal coding by combining different libraries.

Popular tools such as QGIS have encouraged the use of Python by allowing the wider community to contribute plugins written in Python. Tools such as Jupyter Notebooks also make it easier to learn Python, work through given projects, and replicate results.

Many tools have been developed from the start as open source and are easy to access, further encouraging users.

For geospatial analysts, Python has become an indispensable tool for developing applications and powerful analyses. 

Listen

References

[1]    For more on the adoption of Python in GIS and benefits, see:  https://www.geographyrealm.com/use-python-gis/.

[2]    For more on Pandas and GeoPandas, see:  https://pandas.pydata.org/ and https://geopandas.org/ respectively.

[3]    For more on Python and geospatial analysis and GIS integration, see:  Toms, S., Rees, E. V., & Crickard, P. (2018). Mastering Geospatial Analysis with Python. Packt Publishing Ltd.

[4]    For more on the Jupyter family of tools, including Jupyter Notebooks, see:  Vanderplas, J. T. (2016). Python data science handbook: essential tools for working with data (First edition.). Sebastopol, CA: O’Reilly Media, Inc..

Photo of author
About the author
Mark Altaweel
Mark Altaweel is a Reader in Near Eastern Archaeology at the Institute of Archaeology, University College London, having held previous appointments and joint appointments at the University of Chicago, University of Alaska, and Argonne National Laboratory. Mark has an undergraduate degree in Anthropology and Masters and PhD degrees from the University of Chicago’s Department of Near Eastern Languages and Civilizations.