Soil and Agronomy Data Cube for Africa at 30-m spatial resolution

Prepared by: Tom Hengl (OpenGeoHub) and Leandro Parente (OpenGeoHub)

Earth Observation, soil, terrain, land cover and land use, climate data are increasingly available for Africa for research and businesses. This tutorial explains: how to access the iSDAsoil property and nutrient maps for Africa and number of Sentinel-2 cloud-free bands and terrain variables, how to compute with it without a need to download terrabytes of data. A complete tutorial written using Rmarkdown is available here. To learn more about Cloud-Optimized GeoTIFFs and geocomputing in Python please visit also this tutorial.

iSDAsoil methodology and data

Innovative Solutions for Decision Agriculture Ltd (iSDA) is a social enterprise with the mission to improve smallholder farmer profitability across Africa. iSDA has released in November 2020 a fully-fledged Soil Information System of Africa at 30-m spatial resolution (data available under the Creative Commons Attribution license). The main purpose of this data is to help with implementation of Integrated Soil Fertility Management (ISFM) and other sustainable soil management practices in Africa. Production of this data set is documented in detail in this medium article, and also in this peer-reviewed publication:

  • Hengl, T., Miller, M.A.E., Križan, J. et al. (2021) African soil properties and nutrients mapped at 30 m spatial resolution using two-scale ensemble machine learning. Sci Rep 11, 6130. https://doi.org/10.1038/s41598-021-85639-y

The produced predictions are now available as Cloud-Optimized GeoTIFFs through a number of services (Wasabi, Google Earth Engine, Amazon AWS public datasets) and as such are basically available to developers and users without restrictions.

This tutorial explains: how to access the iSDAsoil property and nutrient maps for Africa and number of Sentinel-2 cloud-free bands and terrain variables, how to compute with it without a need to download terrabytes of data. A complete tutorial written using Rmarkdown is available here. The repository https://github.com/iSDA-Africa/ also contains examples with iSDAsoil worked out in Python (how to estimate liming requirements for an area in Rwanda and similar).

What is a “Cloud-Optimized GeoTIFF” (COG)?

Cloud-Optimized GeoTIFFs are post-processed images that are optimized for file sharing and can be considered to be equivalent to Geospatial databases as they can serve spatial queries. It is possibly the best way to distribute spatial layers without restrictions, as users save time accessing data and can directly load data into the majority of GIS software (mainly thanks to the GDAL development team).

How does COG works? COG file has a spatial index based on tiles and scales. So imagine if you wish to overlay a single point to get the value inside the COG, a http service will first locate the tile (usually very fast), then locate the exact pixel inside the COG, and finally return the value (always numeric). So in summary, as long as you only plan to access small portions of the data, COG would typically works very fast and it is as efficient as accessing and searching a geospatial database. What can limit the COG services is maybe the bandwidth, number of requests per IP, size of the data returned in the requests and similar. Also note that, if you use GDAL or any GDAL compatible GIS software (we recommend using QGIS), any processing of the data is done by your local machine: the COG service is only focused on serving the data. It is also a highly portable system as you only have to copy/upload and make GeoTIFFs available (ideally via Amazon S3 or on other Amazon S3 compatible services).

How to use COG’s?

There are typically two main recommended ways to use the COG’s to do modeling and visualization:

  1. Load the data directly into a GIS / Geoserver, then select analysis of interest for the bounding box of interest.

In practice, we recommend using both access paths at the same time, to ensure that you visually validate analysis and the results of analysis. In most simple terms: keep the view on the data open in the QGIS, then use the R / Python to program analysis. For spatial analysis we recommend accessing the data primarily using the rasterio package in python and/or terra / rgdal packages in R.

This is for example the web address of the soil texture classes for Africa at 30-m resolution:

https://s3.eu-central-1.wasabisys.com/africa-soil/layers30m/sol_texture.class_m_30m_20..50cm_2001..2017_africa_epsg4326_v0.1.tif

Important note: please do NOT open this URL in a browser because the total file size is 2.1GiB and your browser will directly try to download the file. Instead, you can add this URL to e.g. QGIS by using:

  1. Select “Layer” “Add Layer”“Add Raster layer”;

Consequently, you should see the following:

iSDAsoil layers (soil texture class) visualized in QGIS.

This looks like the whole map is available locally on your machine, but it is NOT: it is only the preview of the data at some aggregated scale that is actually downloaded. Google Earth and many other web-GIS applications basically work the same (download based on the viewing angle and scale).

In the case above, we have also added the legend by downloading the SLD file for soil texture classes. Note that once you have connected to the COG in QGIS, you can run any spatial analysis that is available from the software. Just have in mind that, anytime you wish to run analysis on larger part of data, QGIS will have to download ALL data for that bounding box and this can get time consuming.

To access the same layer from R, we would run:

library(terra)
tif = “/vsicurl/https://s3.eu-central-1.wasabisys.com/africa-soil/layers30m/sol_texture.class_m_30m_20..50cm_2001..2017_africa_epsg4326_v0.1.tif"
r = rast(tif)
r
class : SpatRaster
dimensions : 268670, 327948, 1 (nrow, ncol, nlyr)
resolution : 0.00027, 0.00027 (x, y)
extent : -31.46424, 57.08172, -34.89109, 37.64981 (xmin, xmax, ymin, ymax)
coord. ref. : +proj=longlat +datum=WGS84 +no_defs
data source : sol_texture.class_m_30m_20..50cm_2001..2017_africa_epsg4326_v0.1.tif
names : sol_texture.class_m_30m_20..50cm_2001..2017_africa_epsg4326_v0.1

Which shows that this is indeed a large GeoTIFF in WGS84 coordinates and with a spatial resolution of about 30-m. Again, R/terra did not download the whole image, but has only “connected” to the file and requested some metadata from the file header. Once we have connected to the COG from R, we can continue doing all standard spatial analysis e.g. crop values, do raster calculations etc. Assuming we are only interested for the values per country, this is then a very efficient system as you would only download minimum data needed for analysis. The whole size of the iSDAsoil layers is about 1.5TiB so definitively we do not recommend downloading all files covering whole of Africa.

Modeling cropland distribution as a function of climate, terrain and soils

In the OpenLandMap tutorial listed on gitlab you can also find an example how the Soil and Agronomy Data Cube for Africa can be used to model distribution of cropland as a function of climate, terrain and soils. To avoid doing excessive computing, we limit the analysis to Ethiopia and 1-km spatial resolution. The tutorial explains how to: (1) first, download, resample and crop layers of interest to polygon map of Ethiopia, (2) load all data into R, (3) fit a Random Forest model that explains distribution of cropland, and (4) predict cropland in Ethiopia assuming (hypothetical) linear decrease of rainfall in the future.

In most simple terms, distribution of cropland can be model as:

Cropland ~ f( rainfall, soil properties, terrain / slope, … )

In the example in the tutorial, we actually use ALL pixels in the 1-km images to fit a model. This is because all target and ALL covariate layers are available as images, hence have in mind that the training matrix is large! To make computing efficient, we use the C++/ranger implementation of random forest (Wright et al. 2017). The modeling result shows:

## Ranger result
##
## Call:
## ranger(fm.crop, data = et.sp1km@data[sel, ], num.trees = 85, importance = “impurity”)
##
## Type: Regression
## Number of trees: 85
## Sample size: 1084576
## Number of independent variables: 8
## Mtry: 2
## Target node size: 5
## Variable importance mode: impurity
## Splitrule: variance
## OOB prediction error (MSE): 31.36564
## R squared (OOB): 0.913671

In this case results show that the model is significant and elevation and rainfall come up as overall most important variables. It is good to see that also soil pH is an important covariate, although in this specific case croplands seems to be dominantly controlled by climate. The resulting comparison actual vs potential shows that indeed, one can expect serious decrease in cropland distribution assuming a decrease in rainfall:

Actual (left) versus potential (right) cropland cover assuming linear drop in rainfall of 30%.

Interested in this type of modeling? Test the iSDAsoil layers / access the data from QGIS and/or R/Python. Run analysis and document your code via github/gitlab; then share the results via Twitter or Medium, and please mention @iSDAAfrica and #SoilData4Africa so we can also follow the progress.

Layers currently available for Africa

Within the iSDAsoil project, we have made number of layers available as COG’s i.e. for public access and use without restrictions (no registration needed, no access costs):

  • iSDAsoil layers representing soil properties and nutrients at two standard depth intervals 0–20 and 20–50 cm;

The Sentinel mosaics for Africa (prepared by MultiOne.hr) are relatively large in size and might still contain artifacts between scenes and missing values beyond water bodies etc. The population density map at 30-m spatial resolution does NOT include some areas such as Sudan’s and Somalia.

To list all layers available at 30-m resolution for the whole of Africa please use this table. To list all layers available at 250-m resolution (global land mask) please use this table. Note: the file versions might change hence your code would need to be updated. Please subscribe to this repository or refer to https://isda-africa.com for the most up-to-date information about iSDAsoil.

Important note: We do NOT recommend downloading whole GeoTIFFs of Africa at 30-m resolution as these are usually 10–20GB in size (per file). The total size of the repository at the moment exceeds 1.5TB. Instead, if you need to analyze the whole land mask of Africa, we recommend downloading the files directly from zenodo.org and/or Amazon AWS. Also note that nutrient stocks and aggregated soil properties can be derived using variety of procedures (see e.g. Hengl & MacMillan (2019)) and the total values might eventually differ.

Other data providers of interest

Other data sources (not included in this Data Cube) and data portals for Africa with Earth Observation and similar data sets:

A more detailed review of the Earth Observation (EO) data services for Africa and trends can be found in Woldai (2020).

Further reading:

FAO, Global Soil Partnership (GSP), (2016). Boosting Africa’s Soils. FAO Regional Conference for Africa (ARC), http://www.fao.org/3/a-i5532e.pdf

Hengl, T., & MacMillan, R. A. (2019). Predictive soil mapping with R (p. 370). Lulu. com. Retrieved from https://soilmapper.org

Hengl, T., Leenaars, J. G., Shepherd, K. D., Walsh, M. G., Heuvelink, G. B., Mamo, T., … others. (2017). Soil nutrient maps of Sub-Saharan Africa: assessment of soil nutrient content at 250 m spatial resolution using machine learning. Nutrient Cycling in Agroecosystems, 109(1), 77–102. doi:10.1007/s10705–017–9870-x

Hengl, T., Miller, M.A.E., Križan, J. et al. (2021) African soil properties and nutrients mapped at 30 m spatial resolution using two-scale ensemble machine learning. Sci Rep 11, 6130. doi:10.1038/s41598–021–85639-y

Hijmans, R. J., Bivand, R., Forner, K., Ooms, J., & Pebesma, E. (2020). terra: Spatial Data Analysis. CRAN. Retrieved from https://rspatial.org/terra

Sarago, V., Barron, K., Albercht, J. (2019). Pushing for adoption of Cloud Optimized GeoTIFF: An imagery format for cloud-native geospatial processing. http://cogeo.org

Woldai, T. (2020). The status of Earth Observation (EO) & Geo-Information Sciences in Africa–trends and challenges. Geo-spatial Information Science, 23(1), 107–123. doi:10.1080/10095020.2020.1730711

Wright, M. N., & Ziegler, A. (2017). ranger: A Fast Implementation of Random Forests for High Dimensional Data in C++ and R. Journal of Statistical Software, 77(1), 1–17. https://www.jstatsoft.org/article/view/v077i01

Not-for-profit research foundation that promotes open geographical and geo-scientific data and develops open source software.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store