Introduction

Contents

Introduction#

In hydrology, scientists try to better quantify the movement of water in, out of, and through the land surface and rivers in order to better predict droughts, floods, navigation hazards, and reservoir operations (Wood et al., 2011). In addition, better understanding of hydrological processes will allow determination of anthropogenic and climate change impacts on the hydrological cycle (McMillan et al., 2020).

From a hydrological point of view, every field, every street, and every part of the world is different. We may have adequate descriptions of how water moves through plants and soils at small scales, but the medium is never the same from one spot to the next. This is the curse of locality (Bierkens, 2015). Nonlinear processes need to be integrated over time and space in the presence of tremendous natural and human-made heterogeneity.

Traditionally, the hydrological community has dealt with this curse by building custom models for small natural watersheds (Beven, 2001). Hydrologists often work with “effective parameters” (Kirchner, 2006; Bárdossy and Singh, 2008), parameters that cannot be measured directly but describe aggregated movement of water through the environment. Typical models are partially based on first-principle physics (mass, momentum, and energy balances) and partially on (statistical or heuristic) assumptions that simplify the complex reality.

Local process knowledge is used to conceptualize many local models by hydrologists from all over the world. This leads to a plethora of local models exhibiting great diversity in the exact methodologies applied, competing hypotheses of hydrologic behavior, technology stacks and programming languages used in these models (Hutton et al., 2016; Hut et al., 2017). Although a wealth of knowledge is encoded in these models, this knowledge is hardly shared due to the technical difficulties of working with models created by other researchers. As a result, models are often chosen based on availability and familiarity, rather than suitability for the research performed (Addor and Melsen, 2019), severely hindering scientific progress in the hydrological community.

Fortunately, the problem of lack of sharing scientific results has recently been picked up as an important topic by the wider academic community. Open Science, where data, publications and software are shared publicly, is deemed more and more important both by researchers and funders (Hall et al., 2022). In addition, the FAIR principles (Wilkinson et al., 2016) describe that all data resulting from research must be made Findable, Accessible, Interoperable, and Reproducible (FAIR). Recently, FAIR has also been extended to software (https://fair-software.eu/, last access: 28 June 2022​​​​​​​) and other academic output.

To enable significant progress in the sharing of hydrological knowledge, we introduce the concept of FAIR hydrological models. FAIR hydrological models make it possible for other researchers to use a model to generate novel scientific results without needing extensive support from the original authors. Any preexisting hydrological model can be made FAIR by adding open interfaces and documentation. With a model that has been made FAIR, it is not only possible to re-create the experiments done by the authors of that model but also to perform novel research by applying the model to different locations with different settings or input data. For a model to be FAIR, not only do the software and required data need to be available but models need to be properly documented and have well-defined interfaces. FAIR Hydrological models have a lower barrier of entry and create scientific results that are as open and FAIR as possible, thereby truly enabling researchers to build on each others results.

In this paper we introduce the eWaterCycle platform, a platform where hydrologists can work with each other or add their own FAIR models. To the best of our knowledge, eWaterCycle is the first platform for hydrological modeling that focuses on providing access to pre-existing models and data sources in a way in which the platform handles the computer and data science aspects to allow the hydrologists to focus on the hydrology. The goal of the eWaterCycle platform is to be a trailblazer for making hydrological modeling open and FAIR. The eWaterCycle platform is designed to support researchers, including graduate students (MSc and PhD levels), to run hydrological experiments with ease, focusing on the science rather than the technology. The platform is designed to enable users to run a simple experiment, such as generating a hydrograph for a certain catchment with a certain model within minutes of getting started, while also giving users the freedom to perform very advanced experiments, such as multi-model coupling and interfering in model states during runtime. With this, we aim to reduce the cycle time in going from idea to experiment (from months to days) and to support fully reproducible experiments. We embrace what has been done already by not rewriting all models from scratch but sharing, reusing, coupling, and building on existing models. The methods and technology developed within the eWaterCycle platform are reusable both inside and outside of Hydrology.

To illustrate how to use the eWaterCycle platform as a hydrologist and demonstrate the type of experiments one can do on the platform (coupling, calibrating, comparing scenarios, etc.), a series of Jupyter notebooks is provided with this paper that showcase the platform. For scientists who want to work with the platform as users, a separate video where a hands-on demonstration of the models is given is available on YouTube (https://youtu.be/eE75dtIJ1lk, last access: 28 June 2022​​​​​​​), and for archiving purposes it is also available on Zenodo (Hut, 2021). The remainder of this paper covers the rationale and the technology behind the platform.

In the eWaterCycle project, we strongly believe that the best approach for generating impact is to build on existing efforts as much as possible. We make use of container technology (specifically Docker (https://www.docker.com, last access: 28 June 2022​​​​​​​) and Singularity (https://sylabs.io/singularity/, last access: 28 June 2022​​​​​​​) that allows capture and preservation of software environments. Pre-processing of forcing data is done using the ESMValTool (Righi et al., 2020), originally developed in the climate sciences. We use Jupyter (https://jupyter.org, last access: 28 June 2022​​​​​​​) as the main user interface to our system. The Basic Model Interface (BMI) (Hutton et al., 2020) provides a stable, easy to implement interface to an existing model. For sharing software, data and results in a FAIR manner, we rely on GitHub (github, last access: 28 June 2022, HydroShare (Horsburgh et al., 2015)) and Zenodo (https://zenodo.org, last access: 28 June 2022​​​​​​​). Any software contributions that we create in the eWaterCycle project are purposely small and independent, and thus there is a high chance that our components are reusable by other projects in turn.

The eWaterCycle platform can incorporate any existing model with ease, be it conceptual, semi-distributed or distributed, in any commonly used programming language. An alternative and often-used approach to making hydrological models FAIR is to create a single model framework that incorporates as many model concepts as possible. This usually requires significant modifications to a model code, and sometimes even a complete rewrite, to fit within the model framework. The result is a coherent set of models that can be interchanged relatively easily. Examples of such model frameworks include wflow (Schellekens et al., 2020), SUMMA (Clark et al., 2015), MARRMoT (Knoben et al., 2019), and Raven (Craig et al., 2020). In a way this approach can be seen as orthogonal to the eWaterCycle platform, as both approaches can be combined. In the eWaterCycle platform, we have incorporated the wflow and MARRMoT frameworks, making it possible to use any of the models within these frameworks within eWaterCycle. Incorporating the SUMMA framework is on the long-term list of goals for the eWaterCycle platform as well but has not been finished at the time of writing.

There are several other platforms that support open and FAIR hydrology that eWaterCycle connects to. Hydroshare (Horsburgh et al., 2015) focuses on making hydrological data FAIR. It offers a service to publish and access datasets in a Jupyter notebook environment. Unlike eWaterCycle, Hydroshare offers no support when using datasets for an experiment and simply provides data as is. Hydroshare can be used to store data resulting from eWaterCycle experiments in a FAIR manner.

A more structured way of acquiring data for use in a hydrological model is used in HydroDS (Gichamo et al., 2020). HydroDS provides a web service where users can call upon the service to download data in a format suitable for their model. HydroShare and HydroDS can also be combined to generate and store data needed to run models (Gan et al., 2020).

The Community Surface Dynamics Modeling System (CSDMS) (Tucker et al., 2022) community gathers a large number of hydrological models in a model repository. This repository contains metadata on models and the source code. In addition, it is encouraged to add a BMI interface to a model to facilitate cooperation between models. BMI simplifies the use of code, but often times the installation and compilation of scientific code is non-trivial and proves a practical bottleneck. The eWaterCycle platform builds on the BMI interface using containerized models offering an easily reproducible model software environment. This includes the support for generating forcing and other needed input for each model, allowing scientists to build on all data and models eWaterCycle provides access to.

The example Jupyter notebooks provided with this paper demonstrate how hydrologists can, for example, couple two models written in different programming languages, calibrate models or run “what if?” scenarios using existing models from research groups all over the world that are forced with datasets from different data providers, all without having to install a single package on their own laptops. To make full use of the knowledge created in hydrology, we do need to be able to stand on each others’ shoulders, which eWaterCycle facilitates.

The rest of this paper is organized as follows. Section 2 introduces the eWaterCycle platform and describes the different functionalities it provides. It explains how each part of the platform contributes to open and FAIR hydrological science. Section 3 presents the technical design and implementation of the platform. Section 4 presents a number of case studies that demonstrate the capabilities of the eWaterCycle platform and highlights the diverse set of supported use cases. Section 5 concludes and discusses future work.

Glossary#

In this paper, we use the following terminology. We acknowledge that different fields of science and even different scientists within single fields may use different definitions of these term (Venhuizen et al., 2019), and we purely provide these definitions here to clarify how we use those terms in this paper and within the eWaterCycle platform.

Hydrological model. A piece of software code that calculates stores and fluxes of water in, through and on the surface of the Earth. Most hydrological models need forcing (input) such as precipitation and produce outputs such as river discharge. Many hydrological models require parameters. Parameter. A constant input (in time, not necessarily in space) that a model needs to calculate the outputs. Parameters can either be calibrated based on (often historical) forcing and output data or parameters can be derived from third-party sources, including (but not limited to) digital elevation maps and soil maps. A parameter does not change over the runtime of the experiment. Forcing. A time-varying input that a model needs to calculate the outputs. In hydrological models the most common forcing is precipitation data. Input(s). Forcings and parameters. State. All variables calculated by the model that are needed to calculate the next time step. Output(s). Any variable calculated by the model that is stored or shared with the experimenter and can be used for analysis. Outputs can be either state variables or derived from state variables (and inputs). For example, in the PCR-GLOBWB model, “channel storage” is a state variable that is updated every time step, while “river discharge” is a value calculated using “channel storage”. Both river discharge and channel storage are outputs of the PCR-GLOBWB model. Observations. Any data derived from observations, direct or indirect, of the Earth system. Observations can be used as forcing, such as precipitation, as parameters, for example soil maps, or as validation for a model output, such as the often used river discharge. FAIR hydrological model. A hydrological model that is findable, accessible, interoperable, and reproducible; see Wilkinson et al. (2016). Note that “accessible” in this context means that it must be clear to anyone how the model can be accessed and not necessarily that it is openly available to anyone. Open Science. The principle of openly sharing all aspects of the scientific endeavor within ethical and legal limits; see Hall et al. (2022). Experiment. A set of hydrological model runs, using inputs, generating and analyzing outputs. An experiment can include actively intervening in the state of the model during runtime. Within eWaterCycle, an experiment is described in a Jupyter notebook. Experiments can be as simple as generating a single hydrograph for a single catchment or as complicated as coupling multiple global models and forcing them with different datasets. See Sect. 4 for example experiments in the eWaterCycle platform. The eWaterCycle platform. The combination of the core eWaterCycle software, all models contributed to eWaterCycle and all available input datasets. The platform is at the time of writing hosted at demonstration infrastructure provided by SURF at https://www.ewatercycle.org/demo (last access: 28 June 2022). The platform is designed to be deployed by system administrators on any sufficiently high-performance infrastructure. Depending on future funding streams, eWaterCycle will be made available on publicly accessible infrastructure for the entire hydrological community.