PCRGlobWB example use case

PCRGlobWB example use case#

This example shows how the PCRGlobWB model can be used within the eWaterCycle system. It is assumed you have already seen this tutorial notebook explaining how to run the simple HBV model for the River Leven at Newby Bridge.

The PCRGlobWB model is an example of a distributed model where fluxes and stores in the balance are calculated for grid cells (often also called pixels). This requires both the forcing data as well as any parameters to also be spatially distributed. Depending on the complexity of the model, these datasets can be quite large in memory size.

Here we will be running PCRGLobWB for Great Brittain and will extract discharge data at the location of the River Leven again, to compare with the HBV model run. We will also demonstrate how to interact with the state of the model, during runtime, showcasing the benefit of using the BMI interface when building experiments using models.

# This cell is only used to suppress some distracting output messages
import warnings

warnings.filterwarnings("ignore", category=UserWarning)

import matplotlib.pyplot as plt
from cartopy import crs
from cartopy import feature as cfeature
from rich import print
import pandas as pd
import xarray as xr
from pathlib import Path
from datetime import datetime
from ipywidgets import IntProgress
from IPython.display import display
import fiona
import shapely.geometry
from pyproj import Geod

import ewatercycle.forcing
import ewatercycle.models
import ewatercycle.parameter_sets

station_latitude = 54.26935849558577  # Newby Bridge location from Google Maps
station_longitude = -2.9710855713537745

# Selecting our region again
camelsgb_id = "camelsgb_73010"

forcing_path_caravan = Path.home() / "forcing" / camelsgb_id / "caravan"
forcing_path_caravan.mkdir(exist_ok=True, parents=True)

pcr_glob_directory = Path("/data/shared/parameter-sets/pcrglobwb_global")
prepared_PCRGlob_forcing = Path.home() / "forcing" / camelsgb_id / "pcrglobwb" / "work/diagnostic/script"

experiment_start_date = "1997-08-01T00:00:00Z"

# 2 end dates: one use for is for the full run, but that takes a long time.
experiment_end_date = "2000-08-31T00:00:00Z"
# experiment_end_date = "1997-08-31T00:00:00Z"

Loading a parameter set#

For this example we have prepared and hosted a global parameter set made by Utrecht University. For each model run, what needs to be specified to deliniate the region of interest is a “clone map”. The config file has many options, one of which is the location of this clone map.

Note that this is very specific to PCRGlobWB. For complex (and legacy) models like PCRGlobWB one needs to know quite detailed information about the model before being able to run it. However, using eWaterCycle does reduce the time for seting up the model and getting it to run.

parameter_set = ewatercycle.parameter_sets.ParameterSet(
    name="custom_parameter_set",
    directory=pcr_glob_directory,
    config="./pcrglobwb_uk_05min.ini",
    target_model="pcrglobwb",
    supported_model_versions={"setters"},
)

print(parameter_set)

ParameterSet(
    name='custom_parameter_set',
    directory=PosixPath('/data/shared/parameter-sets/pcrglobwb_global'),
    config=PosixPath('pcrglobwb_uk_05min.ini'),
    doi='N/A',
    target_model='pcrglobwb',
    supported_model_versions={'setters'},
    downloader=None
)

Load forcing data#

For this example case, the forcing is generated in this seperate notebook. This is a common practice when generating forcing takes considerable (CPU, memory, disk) resources.

In the cell below, we load the pre-generated forcing. Note that in contrast with HBV, PCRGlobWB only needs temperature and precipitation as forcing inputs. Reminder: HBV also needs potential evaporation. PCRGlobWB calculated potential and actual evaporation as part of its update step.

forcing = ewatercycle.forcing.sources["PCRGlobWBForcing"].load(
    directory=prepared_PCRGlob_forcing,
)

print(forcing)

PCRGlobWBForcing(
    start_time='1997-08-01T00:00:00Z',
    end_time='2000-08-31T00:00:00Z',
    directory=PosixPath('/home/mmelotto/forcing/camelsgb_73010/pcrglobwb/work/diagnostic/script'),
    shape=PosixPath('/home/mmelotto/forcing/camelsgb_73010/pcrglobwb/work/diagnostic/script/camelsgb_73010.shp'),
    filenames={},
    precipitationNC='pcrglobwb_OBS6_ERA5_reanaly_1_day_pr_1997-2000_camelsgb_73010.nc',
    temperatureNC='pcrglobwb_OBS6_ERA5_reanaly_1_day_tas_1997-2000_camelsgb_73010.nc'
)

Setting up the model#

Note that the model version and the parameterset versions should be compatible.

pcrglob = ewatercycle.models.PCRGlobWB(
    parameter_set=parameter_set,
    forcing=forcing
)

print(pcrglob)

PCRGlobWB(
    parameter_set=ParameterSet(
        name='custom_parameter_set',
        directory=PosixPath('/data/shared/parameter-sets/pcrglobwb_global'),
        config=PosixPath('pcrglobwb_uk_05min.ini'),
        doi='N/A',
        target_model='pcrglobwb',
        supported_model_versions={'setters'},
        downloader=None
    ),
    forcing=PCRGlobWBForcing(
        start_time='1997-08-01T00:00:00Z',
        end_time='2000-08-31T00:00:00Z',
        directory=PosixPath('/home/mmelotto/forcing/camelsgb_73010/pcrglobwb/work/diagnostic/script'),
        shape=PosixPath('/home/mmelotto/forcing/camelsgb_73010/pcrglobwb/work/diagnostic/script/camelsgb_73010.shp'
),
        filenames={},
        precipitationNC='pcrglobwb_OBS6_ERA5_reanaly_1_day_pr_1997-2000_camelsgb_73010.nc',
        temperatureNC='pcrglobwb_OBS6_ERA5_reanaly_1_day_tas_1997-2000_camelsgb_73010.nc'
    )
)

pcrglob.version

'setters'

eWaterCycle exposes a selected set of configurable parameters. These can be modified in the setup() method.

print(pcrglob.parameters)

dict_items([('start_time', '1997-08-01T00:00:00Z'), ('end_time', '1997-08-01T00:00:00Z'), ('routing_method', 
'accuTravelTime'), ('max_spinups_in_years', '0')])

Calling setup() will start up the model container. Be careful with calling it multiple times!

cfg_file, cfg_dir = pcrglob.setup(
    end_time=experiment_end_date,
    # end_time="2000-08-31T00:00:00Z",  # original end date takes about 22 minutes when alone on server, this is loaded later on, so no need to run it now :)
    max_spinups_in_years=0
)

cfg_file, cfg_dir

('/home/mmelotto/projects/book/tutorials_examples/2_HBV_PCRGlobWB_ERA5/pcrglobwb_20250715_095005/pcrglobwb_ewatercycle.ini',
 '/home/mmelotto/projects/book/tutorials_examples/2_HBV_PCRGlobWB_ERA5/pcrglobwb_20250715_095005')

print(pcrglob.parameters)

pcrglob_para = pcrglob.parameters

# Convert ISO 8601 strings to datetime objects
start_time = datetime.strptime(experiment_start_date, '%Y-%m-%dT%H:%M:%SZ')
end_time = datetime.strptime(experiment_end_date, '%Y-%m-%dT%H:%M:%SZ')

# Calculate the number of days for the progression bar
delta = end_time - start_time
number_of_days = delta.days
print(f"Number of days to model: {number_of_days}")

dict_items([('start_time', '1997-08-01T00:00:00Z'), ('end_time', '2000-08-31T00:00:00Z'), ('routing_method', 
'accuTravelTime'), ('max_spinups_in_years', '0')])

Number of days to model: 1126

Note that the parameters have been changed. A new config file which incorporates these updated parameters has been generated as well. If you want to see or modify any additional model settings, you can acces this file directly. When you’re ready, pass the path to the config file to initialize().

pcrglob.initialize(cfg_file)

We prepare a small dataframe where we can store the discharge output from the model

time = pd.date_range(pcrglob.start_time_as_isostr, pcrglob.end_time_as_isostr)
timeseries = pd.DataFrame(
    index=pd.Index(time, name="time"), columns=["PCRGlobWB: Leven"]
)

timeseries.head()

	PCRGlobWB: Leven
time
1997-08-01 00:00:00+00:00	NaN
1997-08-02 00:00:00+00:00	NaN
1997-08-03 00:00:00+00:00	NaN
1997-08-04 00:00:00+00:00	NaN
1997-08-05 00:00:00+00:00	NaN

Running the model#

Simply running the model from start to end is straightforward. At each time step we can retrieve information from the model.

# Progress bar, since this can take a while
f = IntProgress(min=0, max=number_of_days) # instantiate the bar
display(f) # display the bar

while pcrglob.time < pcrglob.end_time:
    pcrglob.update()

    # Track discharge at station location
    discharge_at_station = pcrglob.get_value_at_coords(
        "discharge", lat=[station_latitude], lon=[station_longitude]
    )
    time = pcrglob.time_as_isostr
    timeseries.loc[time, "PCRGlobWB: Leven"] = discharge_at_station[0]

    # Update progress bar
    f.value += 1

print("Model run finished!")

Model run finished!

Cleaning up#

Models usually perform some “wrap up tasks” at the end of a model run, such as writing the last outputs to disk and releasing memory. In the case of eWaterCycle, another important teardown task is destroying the container in which the model was running. This can free up a lot of resources on your system. Therefore it is good practice to always call finalize() when you’re done with an experiment.

pcrglob.finalize()