Hello World: First Model Run#

Here we will choose a region, get the forcing data, and run a model. We will be using the Caravan forcing data and the HBV model. Mainly because these are the easiest to use and understand. More details of the HBV model and how to use it in eWaterCycle can be found here.

A model needs 3 things to run:

  1. A region (or basin) to run the model on.

  2. A time window to run the model on.

  3. Forcing data for the model to use (+ parameters (set)).

Again we are starting with the imports:

# General python
import warnings
warnings.filterwarnings("ignore", category=UserWarning)

import numpy as np
from pathlib import Path
import pandas as pd
import matplotlib.pyplot as plt
import xarray as xr

# Niceties
from rich import print

# General eWaterCycle modules
import ewatercycle
import ewatercycle.models
import ewatercycle.forcing

The HBV model is a lumped hydrological model. It considers a catchment as a homogeneous area and calculates the major hydrological processes in it. It requires spatially aggregated rainfall and potential evaporation as inputs (ie. forcings). To calculate its modelled discharge at the outlet of the catchment it also needs a set of parameters. Usually these parameters are calibrated using (historical) observational data, so this is also required. in eWaterCycle we provide access to the Caravan dataset, which contains all of the above data for all the catchments in the different Camels datasets. Again, the Caravan dataset is not the only dataset available in eWaterCycle, but it is the easiest to use and understand for a first model run. There is a known problem with the caravan evaporation data, so this is not the best dataset to use for this model. Using the interactive maps at eWaterCycle caravan map one can easily retrieve the identifier of the catchment. Note that changing the region below will work, but that the parameters for the calibration will need to be changed as well.

So let’s start with the region we want to run the model on.

camels_id = "camelsgb_73010"

Now we set the time window we want to run the model on.

experiment_start_date = "2000-08-01T00:00:00Z"
experiment_end_date = "2005-08-31T00:00:00Z"

Setting up the forcing data#

We start with creating a directory to store the forcing data.

forcing_path_caravan = Path.home() / "forcing" / camels_id / "caravan"
forcing_path_caravan.mkdir(exist_ok=True, parents=True)

We will create a caravan forcing object. The eWaterCycle module takes care of the details of the Caravan dataset.

# Generate forcing data

camels_forcing = ewatercycle.forcing.sources['CaravanForcing'].generate(
    start_time=experiment_start_date,
    end_time=experiment_end_date,
    directory=forcing_path_caravan,
    basin_id=camels_id,
)

Important NOTE We suggest that when you work with eWaterCycle yourself, you create different notebooks for generating the forcing data and running the model.

Now we can also load the forcing data:

# Loading the forcing data

camels_forcing = ewatercycle.forcing.sources['CaravanForcing'].load(directory=forcing_path_caravan)

Setting up the model#

Now that we have the forcing data, we can create a model run. For the HBV model, we need to set up the parameters and the initial conditions. For more details about calibration, please look at this section.

For now we have some optimized parameters for a specific research question and initial conditions for this specific region. For the initial conditions for the storages we can specify an array of starting values. If you don’t the model will start ‘empty’ and needs some timesteps to ‘fill up’. Especially for the rootzone storage it helps to not start empty. Note that all units are in mm.

parameter_set =  [
    7.085,  # Imax
    0.837,  # Ce
    76.373, # Sumax
    1.112,  # Beta
    0.245,  # Pmax
    7.801,  # Tlag
    0.096,  # Kf
    0.003,  # Ks
    0.226   # FM
]

initial_conditions = np.array([
    0,      # Si
    100,    # Su
    0,      # Sf
    5,      # Ss
    0       # Sp
])

Now that we have the forcing data and the parameters, we can create this specific HBV model object.

The setup needs 3 different steps:

  1. Creating a model object, an instance of the specific model class. This is provided by the different plugins. At the point of creation, the forcing object that will be used need to be passed to the model

  2. Creating a configuration file that contains all the information the model needs to initialize itself. The format of the configuration file is very model specific. For example, the HBV configuration file contains information on the location of forcing files and the values of parameters and initial storages.

  3. Initializing starts the model container, creates all variables, and basically gets the model primed to start running.

model = ewatercycle.models.HBV(forcing=camels_forcing)
config_file, _ = model.setup(parameters=parameter_set, initial_storage=initial_conditions)
model.initialize(config_file)

Running the model#

The basic loop that runs the model calls the model.update() to have the model progress one timestep and model.get_value() to extract information of interest. More complex experiment can interact with the model using, for example, model.set_value() to change values. In this way

  • multiple models can interact (including be coupled)

  • models can be adjusted during runtime to incoming observations (ie. data assimilation)

  • Effects not covered by the model can be calculated separately and included to create ‘what if’ experiments.

  • and many more applications.

discharge_list = []
time = []
while model.time < model.end_time:
    model.update()
    discharge_list.append(model.get_value("Q")[0])
    time.append(pd.Timestamp(model.time_as_datetime))

model.finalize()

After running the model we need to call model.finalize() to shut everything down, including the container. If we don’t do this, the container will continue to be active, eating up system memory.

Plotting the results#

Finally, we use standard python libraries to visualize the results. We put the model output into a pandas Series to make plotting easier.

model_output = pd.Series(data=discharge_list, name="Modelled_discharge", index=time)
ds_forcing = xr.open_dataset(camels_forcing['Q'])
model_output.plot()
ds_forcing["Q"].plot(label="Observed discharge")
plt.legend()
plt.ylabel("Discharge (mm/d)")

Saving the results#

# We want to also be able to use the output of this model run in different analyses. Therefore, we save it as a NetCDF file
xr_model_output = model_output.to_xarray()

xr_model_output.attrs['units'] = 'mm/d'

# Save the xarray Dataset to a NetCDF file
xr_model_output.to_netcdf(forcing_path_caravan / 'results' / 'modelled_river_discharge_data.nc')