GIS data is not perfect. Like any other data, it can contain errors or inaccuracies that may affect the results of GIS analysis.
Some common sources of errors in GIS data include incomplete or outdated data sources, errors in data entry or conversion, imprecise or inaccurate measurements, and inherent limitations of the data collection method.
Those who work with GIS data should understand that error, inaccuracy, and imprecision can affect the quality of many types of GIS projects, in the sense that errors that are not accounted for can turn the analysis in a GIS project to a useless exercise.
Understanding error inherent in GIS data is critical to ensuring that any spatial analysis performed using those datasets meets a minimum threshold for accuracy. The saying, “Garbage in, garbage out” applies all too well when data that is inaccurate, imprecise, or full of errors is used during analysis.
The power of GIS resides in its ability to use many types of data related to the same geographical area to perform the analysis, integrating different datasets within a single system.
When a new dataset is loaded into a GIS software application, the software imports not only the data, but also the error that the data contains. The first action to take care of the problem of error is being aware of it and understanding the limitations of the data being used.
Accuracy and Precision
Accuracy and precision are both important aspects of GIS data quality, but they refer to different things.
In order to really understand the relevance of accuracy and precision, we should start getting the difference between both terms:
Accuracy can be defined as the degree or closeness to which the information on a map matches the values in the real world. Therefore, when we refer to accuracy, we are talking about quality of data and about number of errors contained in a certain dataset.
In GIS data, accuracy can be referred to a geographic position, but it can be referred also to attribute, or conceptual accuracy.
Precision refers how exact is the description of data. Precise data may be inaccurate, because it may be exactly described but inaccurately gathered. (Maybe the surveyor made a mistake, or the data was recorded wrongly into the database).
In the series of images above, the concept of precision versus accuracy is visualized. The crosshair of each image represents the true value of the entity and the red dots represent the measure values.
Image A is precise and accurate, image B is precise but not accurate, image C is accurate but imprecise, Image D is neither accurate nor precise.
Understanding both accuracy and precision is important for assessing the usability of a GIS dataset. When a dataset is inaccurate but highly precise, corrective measures can be taken to adjust the dataset to make it more accurate.
Error involves assessing both the imprecision of data and its inaccuracies.
Sources of Inaccuracy and Imprecision
Some sources of error in GIS data are very obvious, whereas others are more difficult to notice. GIS software can make the users to think that their data is accurate and precise to a degree that is not quite real.
Scale of GIS Data
Scale, for example, is an inherent error in cartography; depending on the scale used, we will be able to represent different type of data, in a different quantity and with a different quality. Cartographers should always adapt the scale of work to the level of detail needed in their projects. Related: GIS Data and the Coastline Paradox
Age of GIS Data
The age of data may be another obvious source of error. When data sources are too old, some, or a big part, of the information base may have changed.
GIS users should always be mindful when using old data and the lack of currency to that data before using it for contemporary analysis.
GIS Data Formatting Errors
There are some types of errors created when formatting data for processing. Changes in scale, reprojections, import/export from raster to vector, etc. are all examples of possible sources of formatting errors.
Other sources of error may not be so obvious, some of them originated at the moment of initial measurements, even from the moment of capturing the data cause by users.
Quite often we can identify quantitative and qualitative errors.
A common mistake can happen with label or attribute errors.
For instance, an agricultural land may be incorrectly marked as a marsh, and this would cause an error that the map user may not notice because they may not be familiar with the area in question.
Quantitative errors may occur also when using instrument that have not been properly calibrated creating subsequent errors hard to identify in the field, but that will cause your project to lose accuracy and reliability.
Positional accuracy of GIS data
We also have to pay attention to what has been defined as positional accuracy, which is dependent on the type of data.
Cartographers can accurately locate certain features like roads, boundary lines, etc. but other data with less defined position in space such as soil types, may be just an approximate location based on the estimation of the cartographer.
Other features, like climate, for instance lack defined boundaries in nature and, therefore, are subject to subjective interpretation.
Topological errors occur often during the digitizing process.
Errors during digitization or creation of GIS data may result in polygon knots or weird polygons, and loops, and there may be some errors associated with damaged source maps as well.
Intentional GIS data errors
Errors can be intentionally introduced in GIS data.
Generalization of GIS data
Most commonly, generalization in GIS is when the amount of detail for a GIS data set is reduced. Generalization introduces error by removing aspects of a feature.
Another intentional introduction of error is the trademarking sometimes found within datasets by commercial GIS vendors. For example, a GIS data vendor may insert false streets or fake street names into a dataset. This kind of intentional error in a GIS dataset is called a “map trap“.
Always factor the potential error in GIS data sets
We can never forget that inaccuracy, imprecision, and the resulting error, may be compounded in a GIS project when we need to employ more than one data source. In these types of projects, one error leads to another, compounding its effects on the analysis and affecting the entire project.
It is important to recognize that GIS data is always subject to some degree of uncertainty and error, and to take steps to minimize and account for these errors when conducting GIS analysis. This may involve using multiple sources of data, validating and cross-checking data, and being transparent about any limitations or assumptions in the analysis.
The use of metadata, (or data about the data), is one of the first tools that any GIS user should consult in order to know more about the data that he is using and to avoid adding more error to a data that in any case will never be perfect. Any good metadata should always include some basic information like age of the data, origin, area that it covers, scale, projection system, accuracy, format, etc.
This article was originally written on November 6, 2011 and has since been updated.
Related Articles about GIS Data Quality
|• Creating GIS Data
• Types of GIS Data
• Digitizing Errors in GIS
• What is Metadata?
• GIS Glossary
About the Author
Manuel S Pascual (born in Sevilla, Spain) has a Master’s degree in Geography from the University of Seville with majors in Cartography and Photogrammetry. Pascual did his post graduate work at UNM with an emphasis in GIS and Remote Sensing. Pascual has a vast professional international experience in the field of GIS, with projects in Biology, Hydrology, and Environmental Sciences. Pascual has worked on projects for different Governmental Agencies, including the US Forest Service, State of New Mexico, City of Albuquerque, Bernalillo County, Ministere des eaux et forets in Morocco.