Open-Earth-Monitor Cyberinfrastructure project 2023–2027: open environmental data to support EU’s Green Deal

OpenGeoHub
Nerd For Tech
Published in
18 min readJul 13, 2023

--

Prepared by: Tom Hengl (OpenGeoHub), Leandro Parente (OpenGeoHub), Luca Brocca (CNR), Gregory Duveiller (Max Planck Institute for Biogeochemistry), Martin Herold (GFZ), Santiago Ferrer (Vizzuality)

OEMC project was kick-started in June 2022. The first six months of the project were used to build a detailed implementation plan outlined in this document. The project in general aims at continuous development and release of a number of building blocks (back-end, front-end, software and data solutions) components of pan-EU and global monitors and that serve concrete use-cases i.e. diversity of user communities. The main development principles of the OEMC cyberinfrastructure are: (1) it is a federated / decentralized system, (2) it is primarily based on FOSS and aiming at supporting open development communities, (3) it is user-centered hence improvements are based on interaction and feedback from users via 32 use-cases with concrete partners, and (4) it is a genuine open data project based on FAIR principles. The key project deliverables in OEMC project will include: (WP2) Stakeholder committee and user engagement plans throughout project duration; (WP3) Open-Earth-Monitor computing engine and geodata cube; (WP4) Open-Earth-Monitor in-situ (ground) data services; (WP5) Open-Earth-Monitor suite of tools directly serving EU citizens and governance needs via easy-to-use data portals and apps; (WP6) Open-Earth-Monitor suite of tools serving global governance needs.

Introduction

The Open-Earth-Monitor Cyberinfrastructure (OEMC) project (Horizon Europe funding 2022–2027) aims at increasing European capability to generate timely, accurate, disaggregated, people-centered, accessible (GSM-compatible) and user-friendly environmental information based on Earth Observation (EO) data. We plan to achieve this by building a cyberinfrastructure anchored in FAIR data principles, leveraging and improving our existing platforms e.g. OpenEO.org, Geopedia.world, GlobalEarthMonitor.eu, EarthSystemDataLab.net, OpenLandMap.org, EcoDataCube.eu, Geo-wiki, LifeWatch.eu, XCUBE and EuroDataCube.com. The project is coordinated by the OpenGeoHub foundation and is closely aligned to the broader EuroGEO initiative.

OEMC runs in three main phases:

  1. implementation of the computing engine and in-situ O&M data services (2023–2025);
  2. direct application of the Open-Earth-Monitor to support EU Green Deal and other strategic actions (2024–2026);
  3. dissemination and engagement of stakeholders & target users through a series of open workshops, then revise the tools and adjust them to better fit their objectives and limitations (2023–2026).

We specifically target to contribute towards actions in the following directions:

The proposed OEMC system consists of three groups of components:

  1. [Humanware] Development teams: grouped around various tasks and aiming at co-designing, co-developing building blocks and serving use-cases. User-communities can also be considered to be part of the humanware. Currently +100 people are directly involved in this project.
  2. [Dataware and Hardware] Data pools and back-end infrastructure: large datasets / databases and computing infrastructure either on 3rd hosting infrastructures or local infrastructures;
  3. [Software] Software libraries, API’s, front-end solutions: R, Python etc libraries, computational tutorials, and UI toolkits for importing, processing and serving environmental data at continental (EU) and global scales;

Dataware, hardware and software are considered to be the building blocks of the system.

OEMC project in a nutshell

OEMC project is, in a nutshell, a FOSS-based federated solution producing open data (see conformant licenses) and cost-effective data services aiming at supporting European Union programmes. Specific key project objectives are:

  1. Produce an inventory of user needs, data and knowledge that will be used to develop a general framework for increasing uptake and accessibility/exploitability of environmental observation information within the European Union (WP2).
  2. Design, implement and release an operational solution for processing and serving EO data, environmental in-situ-data, and AI, ML and HPC models (OEMC-computing-engine; WP3 and WP4).
  3. Design, implement and release a suite of intuitive tools to enable targeted end-users to monitor the status of natural resources at European / Global scales (WP5, WP6)
  4. Design, implement and release a comprehensive and systematic platform to enhance the FAIRness (findability, accessibility, interoperability and reusability) of environmental observation data (WP5, WP6).
  5. Achieve notable and permanent improvement in access for European stakeholders to existing European and Global environmental observation data and actionable information (WP2-WP7).

At the center of the project, i.e. the main focus of our work, are the so-called “Use-cases” i.e. practical applications of the OEMC where the partner organizations and their users are at the center of the use-case centered design. Use-cases are jointly designed and implemented with 3rd independent parties that are continuously kept in the loop / invited to join conferences and workshops, and a part of the OEMC Stakeholders Committee. OEMC project, hence, largely follows the implementation design of the successfully completed E-shape and similar Horizon Europe projects under the EuroGEO umbrella.

OEMC general project structure: various building blocks are used to build solutions and serve concrete user communities (through use-cases).

For a comparison, e-shape project had a total of 37 pilots, grouped around 6 major topics and strictly structured with objectives, development partners, targeted users / clients, expected outcome of the pilot, timeline of the pilot, nature of the outcome, means of release / means of access, user perspective, impact on the EO community at large, openness and sharing option, success stories clearly defined and tracked throughout the project. Likewise, the OEMC project has 32 use-cases that will try to mimic the structure of the e-shape project as much as possible. Below is an example of one of the larger use-case conducted jointly with UNCCD.

Example of an use case: OEMC project will support UNCCD to transition with their Land Degradation Neutrality tool from 300-m spatial resolution to potentially 30-m spatial resolution. This is an enormous improvement in the level of detail and could potentially bring at the order of magnitude more users and more interest in the LDN project. Full list of use-cases is available at: https://earthmonitor.org/use-cases/.

Other important inspirations for the OEMC project design are the general software solutions enabling easier access and usability of data across formats and cloud-optimized such as GDAL and Cloud-Native geodata formats, reproducible computational notebooks such as Jupyter notebooks and Rbookdown documents, Geo-wiki and Mastodon-type (Fediverse) solutions for large networks of open development communities. Here especially Mastodon (2.5 million users and about 10k servers based on data from June 2023), which is a free and open source microblogging network built as a decentralized federation of independently-operated servers, is picked as the ideal model for organizing geodata producing communities.

Does the EO data science world needs an alternative to Google Earth Engine?

Alphabet/Google with its Google Earth Engine is a complete cloud solution for geoprocessing the majority of EO data (especially the public datasets such as Landsat, MODIS, Copernicus Sentinel etc) by many is considered “the most advanced cloud-based geospatial processing platform in the world”. This is a “free service” for academic and research use. This means that, as long as you know how to code (JavaScript), you can employ petabytes of satellite imagery and produce planetary-scale products and apps even without ANY budget for analysis. This is an amazing infrastructure and very kind of the Alphabet corporation, in fact, to provide this to all citizens at no costs. Many academic organizations are basically completely migrating to using Google Earth Engine (which is 100% cloud-based and you really only need a laptop to build a complete solution from idea to a web-GIS app that works on all platforms). Recently Microsoft also provides the Planetary Computer and Bing Maps and Amazon offers sponsored hosting for open data.

Since Google Earth Engine (GEE) offers all EO data already and since it is for free, should we all maybe migrate to using GEE for our work? The community of users of GEE is highly enthusiastic (see e.g. the awesome-gee-community-catalog by Samapriya Roy and colleagues) and the fact that somebody can develop a single script to produce a super-detailed spatial resolution global map + an app from scratch, perfectly matches ideas of FAIR research. So is the GEE all we ever needed? Here are some cons:

  • GEE is free for research and education purposes only; Google’s terms of use specifically prohibit extraction of content from high resolution imagery; it is questionable if municipalities, government agencies can use this infrastructure (free version) as official service to people, not to mention startups and SME’s;
  • Even though use of GEE is nominally free, to upload/download additional data one needs to use other attached Google cloud services, which can come at significant costs;
  • Google can at any time change terms of use or shut-down GEE or similar. This means that you could potentially lose all your data and your apps could go offline. These are the terms of use that you have accepted during registration.
  • You can not easily deploy functions / solutions produced in other programming languages such as python or R. You obviously can not fork GEE and make your own version of the system so the true reproducibility is limited to running things inside Google’s infrastructure only.
  • Many question if the strategy of the big tech companies to offer “free ride” to millions of users, then profit from mining and commercializing users data, is unethical. Are corporations such as Alphabet/Google, Facebook and similar with their secret business plans and strong monopolization, in fact, evil?

One of the motives of the OEMC project is, thus, to consider and seriously test possible FOSS-based, decentralized alternatives to massive EO computing infrastructures provided by the Silicon-Valley-based tech giants. We certainly do NOT recommend that you should stop using commercial systems including GEE. We, in fact, advise all users to get good at both using (1) FOSS-based not-for-profit, (2) smaller-scale commercial services, and (3) commercial services provided by tech giants, as these all have advantages. Once you are able to test and use all three types of systems, you can always benchmark performance and costs and come to an optimal solution that suits your objectives and budget (which is often likely the combination of the three).

In the meanwhile you should also consider some not-commercial data storage and processing services provided by pan-EU organizations such as European Space Agency (ESA), European Commission, but also USA Federal agencies such as NASA and National Science Foundation (unsorted):

  • OpenEO.cloud hosted by ESA: provides 1 month free access to computing resources and petabytes of EO data, then a support with using outsourced computing services;
  • EuroDataCube open data: serves number of open and commercial datasets such as the global 10 m Sentinel mosaics and similar;
  • Copernicus Data Space: still largely under development, but will most likely become a central place to access various open EO datasets and computing infrastructure in 2024.
  • EarthCube.org infrastructure: provides mainly resources and datasets that demonstrate FAIR research and open data-centered solutions;

Likewise, OEMC will primarily try to develop solutions and examples (aiming at concrete use-cases) that contribute to the list of decentralized, open and FAIR-based tools and datasets.

Main development principles in OEMC project

The project development team has agreed to adhere to some minimum general development principles. The four most important development principles are that: (1) it is a federated / decentralized system, (2) it is primarily (or solely) based on FOSS, (3) it is user-centered system hence improvements are based on interaction and feedback from users, and (4) it is an open data project based on FAIR principles.

The system development will be also based on the following three key premises:

  1. We aim at building & serving analysis/decision-ready data: To increase usage of environmental information it should be distributed as what users consider as decision-ready data or (at least) analysis-ready data. Most current users of EO data have neither the expert domain capacity, nor often the interest, to prepare data until it can be freely and easily used for complex analysis or used directly to serve decision making. In addition, users do not require 3 or 4 overlapping datasets (e.g. Landsat, Sentinel, Proba-V land products), but ideally would prefer a single harmonized, complete, consistent, current and rapidly-updatable datasets. See for example the benchmarking dataset we prepared for the purpose of testing gap-filling algorithms. Another important aspect of decision-ready data is that the EO-pixels can be directly related with bottom-up information coming from national or regional censuses and statistical offices.
  2. We aim at producing economically-assessed environmental information: Climate action or any similar large-scale environmental management/restoration will struggle until most citizens are aware of the financial benefits and co-benefits of ecosystem services. We believe that environmental information needs to be extended to include societal benefits generated by ecosystem services or costs of environmental pressures. Often, though not always, monetary information is easier to apply, for example, by users from the business community. In other words: Climate and Biodiversity Action will not be undertaken unless financial benefits and co-benefits are (more) clear.
  3. Our data solutions are user-centered: Users, i.e. people, should be central to co-designing a system and ought to be involved from the start of the implementation phase. We promote a hybrid bottom-up / top-down approach that would put users at the center of design, without losing on the speed and efficiency of development. In the hybrid approach, prototype “top-down products” are presented mid-way in the project to users for evaluation/recommendations. Users might not be fully aware of what can be done, but once they see it, they can express preferences/suggestions/requirements and get engaged with producing the final “bottom-up products”.

Main project outputs

The OEMC system will deliver number of outputs that can be classified as:

  1. New software solutions to help implement deliverables and especially to serve WPs 3–6. This can be and/or: (I) New or updated existing R, Python, Julia, OSGeo libraries; (II) Functions and services served through API; new standards e.g. for monitoring terrestrial biomass (see e.g. Labrière et al., 2023); (III) Front-end solutions: passive and/or interactive apps, web-mapping portals and dashboards;
  2. New value-added datasets at high spatial resolutions and served as Cloud-Optimized Analysis-Ready data that will serve as a demonstration of functionality and added-value of combining Machine Learning (ML) as implemented in FOSS with massive environmental and EO datasets. This can be and/or: (I) Vector (point, line, polygon) data: most importantly we will generate standardized, analysis-ready training data representing Observations & Measurements (O&M) from federated networks (see e.g. Calders et al., 2023), including citizen science data (Fraisl et al., 2023), that can be used to run machine learning and produce value-added decisions-ready / analysis-ready datasets. This data will be entered into geospatial DBs and/or served through S3 via Cloud-Optimized formats e.g. Flatgeobuf, Geoparquet or similar. (II) Gridded Spatiotemporal datasets (usually complete, consistent time-series of COGs or zarr files) at various spatial resolutions (10, 25, 30, 100, 250, 1000 m) and various temporal support (daily, weekly, monthly, annual, long-term) covering the bounding box / mask of interest defined in the project (pan-EU and global, with special focus on Tropics). (III) Sample datasets i.e. smaller subsets that are used for testing and demo purposes. Small datasets will be best distributed in simple tabular formats e.g. as Simple Features or multiarrays with spatiotemporal coordinates of the center of pixels;
  3. Scientific materials: registered with a DOI and citable in the literature. This includes: (I) peer-reviewed scientific and technical publications, (II) blog posts, (III) lectures and demonstrations (multimedia materials);
  4. Use cases: demonstration of OEMC in action for solving real-life problems, serving concrete stakeholders, then receiving feedback and re-design, re-build and re-publish improvements;

You can follow the project outputs continuously via:

Even though the project follows a federated organization with each partner having significant creative freedom, each project output will follow some minimum quality criteria and the good practice guidelines. The minimum criteria include:

  • Required data and software licenses are used (also following the Consortium Agreement: “All data sets and software components produced in this project (project outputs; tasks outputs) produced by 100% Horizon-Europe-funded partners will be released under one of the Open Source (https://opensource.org/licenses/) and/or Open Data licences (https://opendefinition.org/licenses/) in a timely manner maximum one (1) month after the delivery of the corresponding deliverable”).
  • Official file naming system is used.
  • Standard recommended vocabularies (codes, variables names, keywords) are used.
  • Files are uploaded and/or registered using official registries / project management systems.
  • Complete metadata is provided passing a minimum (automated) check via Geonetwork and/or STAC.
  • Software and data outputs are following the project specifications. They pass validity checks as specified in the minimum requirements column.
  • For each output a support channel is available (GitHub, GitLab support channels or similar) where users can ask questions and register eventual bugs / issues.
  • New attached publications + DOI’s (i.e. how to cite data is specified) are registered in the OEMC catalogs.

The key project deliverables in OEMC project will include:

  • (WP2) Stakeholder committee and user engagement plans throughout project duration;
  • (WP3) Open-Earth-Monitor computing engine and geodata cube;
  • (WP4) Open-Earth-Monitor in-situ (ground) data services;
  • (WP5) Open-Earth-Monitor suite of tools directly serving EU citizens and governance needs via easy-to-use data portals and apps;
  • (WP6) Open-Earth-Monitor suite of tools serving global governance needs.
Example of a planned typical geoprocessing workflow and OEMC buidling blocks. The WP3 serves a number of software solutions (the OEMC library) that are used for specific tasks.

The back-end components Open-Earth-Monitor computing engine (WP3) and in-situ O&M data service are imagined here as the core functional components, i.e. building blocks, of the cyberinfrastructure that will ultimately support producing the most accurate and most complete and consistent analysis-ready data, which can then be shared via WP5 and WP6 (see complete list of monitors below). They might be made available to external parties in the later part of the project including as commercial services to ensure quality and sustainability.

Targeted list of pan-EU and global monitors. Monitors are led by various OEMC project partner.

OEMC monitors will be implemented as a 3-tier system:

  1. Tier 1: the central EarthMonitor.org App / viewer with quality-controlled layers and monitors;
  2. Tier 2: partner-based monitors and building blocks (federated approach);
  3. Tier 3: on-demand monitors that users can build rapidly with few lines of code i.e. by using out-of-box FOSS solutions such as G3W, Lizmap, xcube viewer, Rshiny apps or similar;

The Tier 2 building blocks (e.g. xcube viewer for EarthSystemDataLab.net, OpenLandMap, OpenEO.cloud editor, EuroDataCube.com, Geo-wiki, Geopedia.world, EcoDataCube.eu and similar) are at the center of the development. Selection of the successfully produced layers and solutions in the Tier 2 are then integrated into a single seamless system: the central EarthMonitor.org App. Consider for example the predictions of the future vegetation (biomes) described in Bonannella et al. (2023). The most extensive version of the data is hosted on OpenLandMap.org, while a selection of layers that can support on-the-ground activities / serving specific use-cases and partner organizations, will be added to the World-reforestation monitor (WP6) and will be made available in combination with other layers from Tier 2 stream.

The EarthMonitor.org App (the central landing page) will be a cloud-based service with robust and secure back-end and front-end, and with data being updated on an annual, monthly or in some cases even weekly basis. The EarthMonitor.org App will be accessible from a single landing page (a professional and user-experience-designed GUI) via a single robust visualization framework i.e. a single landing page. The users will be able to directly engage with so-called “geo-stories” comparable to e.g. Geostory extension in GeoNode. The geo-stories will be self-explanatory and will allow users to seamlessly visualize and experience spatial and temporal trends, events and effects of scenario testing. Their main purpose will be to quickly inform, explain and engage visitors regardless of their level of expertise.

What is an “environmental monitor”?

One of the main objectives of the OEMC project is to build a number of environmental monitors to serve concrete organizations / European Union programmes. But what is a “monitor”? In the EO context an environmental monitoring system typically implies:

a back-end front-end solution serving decision-ready data e.g. through a web-GIS + dashboard, and which shows current, past and/or future states of environment and environmental events which potentially affect quality of life of citizens and/or living beings.

Main targeted uses of an EO-based monitoring system are usually:

  1. To help raise awareness / warn users of potential negative trends, unexpected events and natural hazards or risks (hence we use geo-stories);
  2. To provide most up-to-date information in a seamless visualization framework easy to interpret by general public (ideal case) or at least by targeted professionals;
  3. To serve as an objective basis for decision making i.e. as a support to local, national and confederal governments;
  4. To serve as input to statistical offices to register and archive events;

Typical examples of EO-based monitors that we follow as a model include e.g. (unsorted):

Currently no single system in the world exists where users can track all aspects of the environmental dynamics across borders. In fact, for many environmental processes, we are potentially not even aware of the trends, main drivers of dynamics or events nor is there any decision-making / response. For example, we still know relatively little about the land degradation, causes of loss of biodiversity / about hotspot locations of the biodiversity decline and why and where exactly some insect species e.g. bees are disappearing the most.

OEMC project is trying to bridge this gap, especially by demonstrating that distributed data can be seamlessly integrated into data dashboards and used to raise awareness and help support decision-makers.

Environmental monitoring systems can be classified based on three main aspects: (1) main type of monitoring, (2) main natural resource of interest, and (3) spatial and temporal coverage. In the OEMC project we also refer to the following classification system of monitors:

Based on the nature of the monitoring target:

  1. Human-caused-events monitors: focusing on distinct events caused by individuals / human activity such as oil spills, industrial pollution, GHG emissions, distinct change of land use, clear-cuttings / unregulated deforestation (see e.g. Camara et al., 2023) etc;
  2. Natural-hazard-events monitors: focusing on distinct natural events e.g. outbreaks of diseases, fires, earthquakes, flood events;
  3. Ecosystem-health monitors: focusing on longer-term processes, continuous activities and transitions e.g. climate change, air and litter pollution, loss of biodiversity etc;
  4. Socio-economic monitors: focusing on how socio-economic processes and events (including political decisions and are hence country-based) impact environmental dynamics for example night-lights dynamics.

Based on the main resource of interest / main theme (these can be multiple applicable but one can be considered to be the main theme):

  1. Biodiversity;
  2. Forest resources;
  3. Soil resources;
  4. Fresh-water resources;
  5. Atmospheric resources;
  6. Oceans and seas;
  7. Mineral and geological resources;

Based on the temporal coverage:

  1. Real-time;
  2. With daily to weekly updates;
  3. With monthly to seasonal updates;
  4. With annual updates;

Based on spatial coverage:

  1. Global;
  2. Continental / regional;
  3. National;
  4. Provincial / local;

What makes an effective “environmental monitoring system” i.e. an effective dashboard UI? We recommend the following minimum checklist:

  • Widest possible domain of users (including with no GIS skills) should be able to engage with geo-stories. A possible solution is that users can directly run an animation by intuitively selecting a story with one-click as in e.g. Google Earth Engine time-lapse.
  • Users can see information about data sets. E.g.: text blocks are attached to geo-stories explaining where does the data come from, how to interpret it etc.
  • Geo-stories are relevant and user-centered i.e. they are optimized based on the user-requirements. A possible and partial solution is to track traffic (e.g. most visited geo-stories / best ranked geo-stories are listed at the top making it easier to users to navigate through content) but even before that it is important to work together with user communities and collect their feedback (our WP2) before the construction of the monitors starts.
  • All content is provided in a way that allows users to share it, save it and embed it in their own blogs/websites. A possible solution is to use the an URL bookmarking system that contains all visualization parameters so that all users across platforms and browsers see exactly the same visualization.
  • Users should be able to easily orient themselves in the monitors and customize visualization. Possible solutions: spatial layers are combined with background maps and training ground-truth O&M; basic visualization tools such as transparency, slider-map-comparison, animation and effects, are seamlessly integrated into the dashboard.
  • Advanced users can also access data in professional data catalogs and download it for their own purpose, which can be achieved through automatically generated cross-links.

How can the OEMC project help you with your work?

If you are producing global or pan-EU environmental monitors, especially if you are aiming at producing and maintaining open geodata that can be used to raise awareness and help reach people on the ground, we could potentially host your data on our infrastructure. Please contact the project lead via: https://earthmonitor.org/contact-us/ and let us know about your project and how you think we could help you. You can also connect with us through https://twitter.com/EarthMonitorOrg and/or https://fosstodon.org/tags/OpenEarthMonitor.

If you are looking for commercial solutions i.e. customize services that you can use at shorter notice (potentially within days or weeks), please contact directly our commercial partners in the consortium:

OEMC project in 2023/2024

What to expect from this project in 2023/2024? We are planning a number of workshops and hackathons at the Open-Earth-Monitor Global Workshop 4–6 October 2023, Eurac Earth Observations Institute, Bolzano (Italy) “Connecting open EO solutions to boost European and global goals”. Come and meet the consortium and interact with the key developers and components of the systems. Help us build better software and data for global good!

--

--

OpenGeoHub
Nerd For Tech

Not-for-profit research foundation that promotes open geographical and geo-scientific data and develops open source software.