Creating Ground-level Views from Satellite Imagery

Mark Altaweel

Updated:

Many techniques, using statistics or artificial intelligence, exist that help classify and identify areas on satellite imagery. This includes land use characteristics such as urban spaces, agriculture lands, forests, etc. However, recreating a ground-level image and perspective using satellite imagery has only recently been developed and is now an active area of research.

Such work has the potential to not only classify land more accurately but it can also provide a ground-level perspective that indicates how it differs or is like other similar classes. (Related: Converting Historical Maps to Satellite-Like Imagery)

Using Machine Learning to Recreate Ground-level Views From Satellite Imagery

One pioneering technique developed in providing ground-level views from satellite images was developed by the University of California, Merced. In this case, conditional generative adversarial networks (cGANs) were created by training with sets of imagery of not only satellite views of different land covers but also using ground-based images to associate satellite views with.

These are geolocated images that correspond to an area that can be linked to a satellite view. The algorithm uses a so-called ‘generator’ to create images that are then assessed by a ‘discriminator’ to determine the appropriateness of the view. Over time, the generator learns which set of satellite images best reproduce ground-level views.

The generated images are not in high detail, but have simple features such as roads, trees, houses, etc. that are auto-generated. Overall, land use accuracy determination using their technique is about 73%, where it was used in the area of London as a testbed.[1]



Free weekly newsletter

Fill out your e-mail address to receive our newsletter!
Email:  

By entering your email address you agree to receive our newsletter and agree with our privacy policy.
You may unsubscribe at any time.



Top: Proposed conditional generative adversarial network consisting of a generator which produces ground-level views given overhead imagery, and a discriminator which helps with training the generator as well as learning useful representations. Bottom: Select overhead image patches, the ground-level views generated by the framework, and the real ground-level images. Source: Deng, Zhu, & Newsom, 2018.
Top: Proposed conditional generative adversarial network consisting of a generator which produces ground-level views given overhead imagery, and a discriminator which helps with training the generator as well as learning useful representations. Bottom: Select overhead image patches, the ground-level views generated by the framework, and the real ground-level images. Source: Deng, Zhu, & Newsom, 2018.

Most previous work in this area has attempted to merge, rather than auto-generate new, artificial images, ground-level views with that of satellite imagery. In such cases, rather than a learning algorithm these methods use image merging and interpolation techniques to recreate 3D models and views that represent aerial and ground-level images together to provide a more complete perspective. These techniques are not reproducible to areas that are unknown, such as the work by the University of California, Merced.[2]

Convolutional neural network (CNN) Approaches

In general, it is only the last few years there have been advancements in the field of ground-level image view creation and satellite imagery used to create such views. Another recent technique uses convolutional neural network (CNN) approaches that predicts what pixel values in aerial imagery result in relation to their ground-level values.

Similar to the cGANs method, learning is done using a combination of ground-level views and satellite imagery. In both aerial and ground-level images, semantic segmentation is used to assign the interpretive value for the view scene, which then the generator uses to help recreate the scene after training.[3]

Similarly, geolocalization of images taken from the ground are a related area where determining where on a satellite image a photograph was taken presents similar problems. One approach that could help the problem of reconstructing ground-level views from aerial imagery is applying a volumetric estimation of an area which can then help generate or at least known if a given area is relevant in matching a ground-level view with that from an aerial view.

By segmenting both types of images using volume, and not just shape, and then comparing their volumes, it allows greater accuracy in determining where a photograph was taken if one only had a satellite view to compare to a photograph from ground-level. This has application to ground-level scene generation because it could potentially be used to recreate more 3D scenes, where 3D generally has been more accurate in reproducing ground-level features.[4]

While, overall, accuracy rates and generation of ground-level views are still complex and limited, there has been a lot of advancement of late. Problems are mostly seen in generating more detailed and accurate ground-level views that can better describe the area. Solving this problem will make land use classification and other application areas benefit more greatly. We will likely see these limitations addressed in the near future as scientists focus more in this area.

References

[1]    For more on generating land cover types and ground views from satellite imagery, see:  Deng, X., Zhu, Y., Newsam, S. 2018. What is it like down there? Generating dense ground-level views and image features from overhead imagery using conditional generative adversarial networks. Submitted to ACM SIGSPATIA. https://arxiv.org/pdf/1806.05129.pdf.

[2]    For more on image development using ground images and airborne views, see:  Frueh, C., & Zakhor, A. (2003). Constructing 3D city models by merging ground-based and airborne views (Vol. 2, pp. II-562–569). IEEE Comput. Soc. https://doi.org/10.1109/CVPR.2003.1211517.

[3]    For more on CNN algorithms used for aerial and ground-level scene generation and prediction, see:  Zhai, M., Bessinger, Z., Workman, S., & Jacobs, N. (2017). Predicting Ground-Level Scene Layout from Aerial Imagery (pp. 4132–4140). IEEE. https://doi.org/10.1109/CVPR.2017.440.

[4]    For more on volumetric measurements and accuracy of geolocating, see: Ozcanli, O. C., Dong, Y., & Mundy, J. L. (2016). Geo-localization using Volumetric Representations of Overhead Imagery. International Journal of Computer Vision, 116(3), 226–246. https://doi.org/10.1007/s11263-015-0850-9.

See Also

Photo of author
About the author
Mark Altaweel
Mark Altaweel is a Reader in Near Eastern Archaeology at the Institute of Archaeology, University College London, having held previous appointments and joint appointments at the University of Chicago, University of Alaska, and Argonne National Laboratory. Mark has an undergraduate degree in Anthropology and Masters and PhD degrees from the University of Chicago’s Department of Near Eastern Languages and Civilizations.