- Relational data are critical to geospatial researchers
- Learning SQL and using it within GIS can have great payoff for users
- Tools such as PostGIS make a powerful backend for geospatial software
One way to leverage the power of GIS as a data service and analytical tool is to better integrate relational database capabilities. Tools such as PostGIS offer this capability, where it effectively extends Postgre SQL as a geospatial service.
In a MapScaping podcast with Paul Ramsay, a co-founder of PostGIS, he makes the case that geospatial specialists could greatly benefit by learning and integrating relational databases in their projects and tools.
Relational databases offer the capability of quickly providing needed data retrieval within complex data problems, such as merging and integrating varied datasets for use.
Free weekly newsletter
Fill out your e-mail address to receive our newsletter!
By entering your email address you agree to receive our newsletter and agree with our privacy policy.
You may unsubscribe at any time.
The relational model is efficient while bringing data closer to the software by being stored in the execution layer. This advantage is evident by the fact that relational databases can conduct computational complex operations with minimal coding.
The challenge for geospatial experts, however, is they need to learn languages such as Structured Query Language (SQL) to access such capabilities. Unlike third generation programming languages, such as Python or R, fourth generation languages such as SQL abstract computing even further and specify what you want out of a function.
While this removes loop or other syntax that many are use to, it could also make the syntax of SQL seem daunting to some given its differences. People often become adapted to procedural languages so much that SQL may seem somewhat alien.
Often, when people do use SQL in a geospatial process, the coding is inefficiently used, such as returning everything from a database rather than specifying the exact query needed. Nevertheless, power commands, such as table joins, allow users to do complex functions with only two lines of coding, minimizing the need to create a large set of code for such operations using a procedural language.
There is effectively a big payoff once geospatial experts leverage SQL by conducting operations with few lines of coding that can be directly integrated into tasks.
There are also tools to make life easier, such as Apache Hadoop and even PostGIS is effectively a way to facilitate SQL commands.
The fact that SQL has existed since the 1970s and became standardized by the 1980s shows that the relational model is still among the most powerful data models available.
Lesson Learned from Using SQL in GIS
The lessons we learn from SQL is that spatial data experts who take the time to learn basic query and analytical functions in SQL could have greater payoff. In fact, it might be best to learn SQL without spatial operations initially so that GIS knowledge does not interfere with the learning process and one can focus on the syntax that makes relational models powerful.
One example of a mistake is that geospatial experts may be tempted to use buffer functions to find data within a given region from a point. But, in reality, all one needs is a distance radius function that can return the same data around a point without the extra computational costs that buffer functions give.
You can get performance penalties even if you find the data you need; learning an efficient process for finding relevant data can have better performance payoff.
There are, of course, challenges with using SQL tools by themselves, which includes the fact that SQL tools for GIS are not actually visual tools, although GIS tools could visualize them. Using SQL could be difficult for spatial experts adapted to seeing the results of their work in a clear visual process.
Nevertheless, part of the benefit to the geospatial community is the analytical capability of relational tools such as PostGIS. Currently, there are over 700 analytical commands and more are being added with every release (currently version 3.1).
Some commands include those contributed by the user community, such as a concave function applied within the tool. Indexing and scheduling of tasks as actions allow users to structure databases flexibly for projects and more complex geospatial software development.
PostGIS uses an inbuilt language called PL/pgSQL, which is analogous and comparable to Oracle’s PL/SQL. There are also capabilities for functions to be done outside of the database such as SQL script that can run an external scheduler.[1]
NoSQL Databases
In recent years, NoSQL databases have become popular, in part because they do not use the same form of syntax that SQL and relational models use. (More: GIS and NoSQL Databases) Some users like the flexibility in creating their own data structures and models, where users can create and define a given data model.
Nevertheless, PostGIS and SQL in general do provide greater analytical capabilities, in addition to data retrieval and storage, which NoSQL tends to be weaker on. By having analytics directly within a tools such as PostGIS, then users do not need to spend extra time creating an analytical layer in functions.
Despite the fact that SQL has existed for more than 4o years as a language, it is still powerful in providing a relational model to users.
The relational model has proven to be fast, efficient, and provides powerful computing for data retrieval. This likely means that SQL-based tools are likely to be around for much longer.
For geospatial experts, this means learning the relational model and tools such as PostGIS or other SQL-related tools can have benefits in making more powerful geospatial applications.
Reference
[1] For more on SQL and the relational model, see: Date, C.J., 2015. SQL and relational theory: how to write accurate SQL code, Third edition. ed, Theory in practice. O’Reilly Media, Inc, Sebastopol, CA.