Workshop Caravan: Choose Region & Calibrate Model

Workshop Caravan: Choose Region & Calibrate Model#

Welcome to the eWaterCycle Caravan workshop!

Here you can choose a Camels catchment from the Caravan dataset, with which we can do HBV modelling! After that will manually calibrate the model to fit your region. Later on you could also use algorithms to calibrate our model. (but this will take some time)

Please keep in mind that I am looking for ways to improve our platform, if you found something hard to understand, let me know!

Importing modules#

Starting with all the libraries and utility functions. We have some general Python packages that we use. And then the eWaterCycle packages, this is where we import our ‘interface’, models & our way of loading generalised forcings for all supported models. Also I import some utility functions just for this notebook that I made to make it easier.

NOTE geopandas and folium will be installed, as they are not installed by default. But you will need to restart the kernel! Do this by going to the top-left “Kernel” –> “Restart Kernel…” OR click the “rewind” icon left of the “forward” icon, which is left of “Markdown”.

# General python
import warnings
warnings.filterwarnings("ignore", category=UserWarning)

import importlib
import subprocess
import sys
import numpy as np
from pathlib import Path
import pandas as pd
import matplotlib.pyplot as plt
import xarray as xr
import os
from IPython.display import display
import json

# Niceties
from rich import print

# General eWaterCycle
import ewatercycle
import ewatercycle.models
import ewatercycle.forcing

# Check and install geopandas and folium
def install_if_missing(package_name):
    try:
        importlib.import_module(package_name)
        print(f"'{package_name}' is already installed.")
    except ImportError:
        print(f"'{package_name}' not found. Installing...")
        subprocess.check_call([sys.executable, "-m", "pip", "install", package_name])
        print(f"PLEASE RESTART KERNEL")

install_if_missing("geopandas")
install_if_missing("folium")

import geopandas as gpd
import folium

# Utility functions
from util_functions import *

'geopandas' is already installed.

'folium' is already installed.

Select your region#

Using the interactive maps at eWaterCycle caravan map one can easily retrieve the identifier of the catchment. We will save our chosen region and manual calibration.

# Make a settings file to store our settings
settings = dict()

# Your CAMEL ID 
# HURONS (RIVIERE DES) EN AVAL DU RUISSEAU SAINT-LOUIS-2 is selected by default
camel_id = "hysets_02OJ024"

# We use this function to see the available discharge data block(s).
# We then pick the longest block as starting and end dates.
start_calibration_block, end_calibration_block = get_time_blocks_available_discharge_data(camel_id)

# Done to clear the temporary files this function creates
!cd ~ && rm -rf tmp/ 

Block 1: 1981-01-01 00:00:00 → 2003-01-08 00:00:00  (Length: 8042 days 00:00:00)
Block 2: 2007-01-22 00:00:00 → 2013-10-21 00:00:00  (Length: 2464 days 00:00:00)
Block 3: 2003-01-10 00:00:00 → 2005-10-14 00:00:00  (Length: 1008 days 00:00:00)
Block 4: 2005-10-19 00:00:00 → 2006-11-10 00:00:00  (Length: 387 days 00:00:00)

Now we will save our starting dates and we will use the settings file to save the paths for future needs.

settings["caravan_id"] = camel_id

# Choose your start & end date of your calibration or use the longest block
start_date = "1985-08-01T00:00:00Z"
end_date = "1995-07-31T00:00:00Z"
# Or
# start_date = start_calibration_block
# end_date = end_calibration_block

settings["calibration_start_date"] = start_date
settings["calibration_end_date"] = end_date

settings["future_start_date"] = "2029-08-01T00:00:00Z"
settings["future_end_date"] = "2049-07-31T00:00:00Z"


home_path = Path.home()

settings["base_path"] = str(home_path / "my_data" / "workshop_canada" )

settings["path_caravan"] =  str(Path(settings["base_path"]) / "forcing_data" / settings["caravan_id"] / "caravan")
Path(settings["path_caravan"]).mkdir(exist_ok=True, parents=True)

settings['path_shape'] = settings["path_caravan"]  + f"/{camel_id}.shp"

settings["path_ERA5"] =  str(Path(settings["base_path"]) / "forcing_data" / settings["caravan_id"] / "ERA5")
Path(settings["path_ERA5"]).mkdir(exist_ok=True, parents=True)

settings["path_CMIP6"] =  str(Path(settings["base_path"]) / "forcing_data" / settings["caravan_id"] / "CMIP6")
Path(settings["path_CMIP6"]).mkdir(exist_ok=True, parents=True)

settings["path_output"] =  str(Path(settings["base_path"]) / "output_data" / settings["caravan_id"] )
Path(settings["path_output"]).mkdir(exist_ok=True, parents=True)

Calibration the HBV model#

There are several ways to calibrate the model:

This simple interactive way, as you will see it quite hard to calibrate it properly.
Use a calibration algorithm, but this will take longer (and is less fun :) )

We are going to use a ‘local’ HBV model for this calibration, meaning that this is run in the cell itself, so it will not start up a container and work in the normal ways eWaterCyle works.

# Custom function to create a data object, which will work with eWaterCycle  
# NOTE: this is already a newer version of the CaravanForcing funtion, normally this will be under: ewatercycle.forcing.CaravanForcing
caravan_forcing_object = CaravanForcing.generate(
    start_time=settings["calibration_start_date"],
    end_time=settings["calibration_end_date"],
    directory=settings["path_caravan"],
    basin_id=settings["caravan_id"],
)

# Parameters for the interactive calibrating
params = {
        'I_max': {'min': 0, 'max': 10},
        'Ce': {'min': 0.1, 'max': 1},
        'Su_max': {'min': 40, 'max': 800},
        'beta': {'min': 0.5, 'max': 5},
        'P_max': {'min': 0.001, 'max': 0.3},
        'T_lag': {'min': 1, 'max': 10},
        'Kf': {'min': 0.01, 'max': 0.2},
        'Ks': {'min': 0.0001, 'max': 0.01},
        'FM': {'min': 0.001, 'max': 5}
    }

# Rewrite the forcing for the interactive calibration plot, this is also used for the observed streamflow
calibrate_forcing = load_data_HBV_local(caravan_forcing_object)
# The interactive calibration plot
interactive_plot(HBV_snow, calibrate_forcing, params)

# Add your parameters here, the once already there are from a SCE Calibration
parameters_found = [
0,   # Imax - Interception capacity [mm]
922,   # Ce - Soil runoff coefficient [-]
075,  # Sumax - Max soil moisture storage [mm]
703,    # Beta - Shape parameter for runoff generation [-]
001,    # Pmax - Percolation threshold [mm/day]
018,    # Tlag - Routing lag time [days]
100,   # Kf - Fast runoff recession coefficient [1/day]
006,   # Ks - Slow runoff recession coefficient [1/day]
244   # FM - Snowmelt factor
]

# Store the parameters found
settings["hbv_parameters_manual"] = parameters_found

# Write to a JSON file
with open("settings.json", "w") as json_file:
    json.dump(settings, json_file, indent=4)

print(caravan_forcing_object)

CaravanForcing(
    start_time='1985-08-01T00:00:00Z',
    end_time='1995-07-31T00:00:00Z',
    directory=PosixPath('/home/mmelotto/my_data/workshop_canada/forcing_data/hysets_02OJ024/caravan'),
    shape=PosixPath('/home/mmelotto/my_data/workshop_canada/forcing_data/hysets_02OJ024/caravan/hysets_02OJ024.shp'
),
    filenames={
        'tasmax': 'hysets_02OJ024_1985-08-01_1995-07-31_tasmax.nc',
        'tasmin': 'hysets_02OJ024_1985-08-01_1995-07-31_tasmin.nc',
        'Q': 'hysets_02OJ024_1985-08-01_1995-07-31_Q.nc',
        'pr': 'hysets_02OJ024_1985-08-01_1995-07-31_pr.nc',
        'tas': 'hysets_02OJ024_1985-08-01_1995-07-31_tas.nc',
        'evspsblpot': 'hysets_02OJ024_1985-08-01_1995-07-31_evspsblpot.nc'
    }
)

Setting up the model#

Now that we have a calibrated model we can start to run things as they are supposed to. Meaning a containerized model will be run on our supercomputer.

Below we show the core use of eWaterCycle: to users models are objects. We stay as close as possible to the standard BMI functions, but make sure that under the hood eWaterCycle makes everything run. Getting a model to run requires three steps:

creating a model object, an instance of the specific model class. This is provided by the different plugins. At the point of creation, the forcing object that will be used need to be passed to the model:

#               Si,  Su, Sf, Ss, Sp
s_0 = np.array([0,  100,  0,  5,  0])

model = ewatercycle.models.HBV(
    forcing=caravan_forcing_object
)

creating a configuration file that contains all the information the model needs to initialize itself. The format of the configuration file is very model specific. For example, the HBV configuration file contains information on:

the location of forcing files
the values of parameters and initial storages

config_file, _ = model.setup(
    parameters=parameters_found, 
    initial_storage=s_0
)

initializing starts the model container, creates all variables, and basically gets the model primed to start running.

model.initialize(config_file)

Running the model#

The basic loop that runs the model calls the model.update() to have the model progress one timestep and model.get_value() to extract information of interest. More complex experiment can interact with the model using, for example, model.set_value() to change values. In this way:

multiple models can interact (including be coupled).
models can be adjusted during runtime to incoming observations (ie. data assimilation).
Effects not covered by the model can be calculated seperatly and included to create ‘what if’ experiments.
and many more applications.

# Setting lists to fill with data in the model runs
Q_m = []
time = []

# The model run
while model.time < model.end_time:
    model.update()
    Q_m.append(model.get_value("Q")[0])
    time.append(pd.Timestamp(model.time_as_datetime))

After running the model we need to call model.finalize() to shut everything down, including the container. If we don’t do this, the container will continue to be active, eating up system memory.

model.finalize()

Checking our results#

Finally, we use standard python libraries to visualize the results. We put the model output into a pandas Series to make plotting easier.

model_output = pd.Series(data=Q_m, name="Modelled_discharge", index=time)

calibrate_forcing["Q"].plot(label="Observed discharge")
model_output.plot(label="Modelled discharge")

plt.legend()
plt.grid()
plt.ylabel("Discharge [mm/d]")
plt.xlabel("Year")
plt.show()

../../_images/07dd31267fd310289f4dfd435fa5086b6c0c88fa0480993c4a944deff5481ee7.png

!rm -rf hbv_*