PCRGlobWB example use case#
This example shows how the PCRGlobWB model can be used within the eWaterCycle system. It is assumed you have already seen this tutorial notebook explaining how to run the simple HBV model for the River Leven at Newby Bridge.
The PCRGlobWB model is an example of a distributed model where fluxes and stores in the balance are calculated for grid cells (often also called pixels). This requires both the forcing data as well as any parameters to also be spatially distributed. Depending on the complexity of the model, these datasets can be quite large in memory size.
Here we will be running PCRGLobWB for Great Brittain and will extract discharge data at the location of the River Leven again, to compare with the HBV model run. We will also demonstrate how to interact with the state of the model, during runtime, showcasing the benefit of using the BMI interface when building experiments using models.
# This cell is only used to suppress some distracting output messages
import warnings
warnings.filterwarnings("ignore", category=UserWarning)
import matplotlib.pyplot as plt
from cartopy import crs
from cartopy import feature as cfeature
from rich import print
import pandas as pd
import xarray as xr
from pathlib import Path
from datetime import datetime
from ipywidgets import IntProgress
from IPython.display import display
import fiona
import shapely.geometry
from pyproj import Geod
import ewatercycle.forcing
import ewatercycle.models
import ewatercycle.parameter_sets
station_latitude = 54.26935849558577 # Newby Bridge location from Google Maps
station_longitude = -2.9710855713537745
# Selecting our region again
camelsgb_id = "camelsgb_73010"
forcing_path = Path("/data/datasets/egu/forcing")
prepared_forcing_path_caravan_central = forcing_path / camelsgb_id / "caravan"
shapeFile = prepared_forcing_path_caravan_central / f"{camelsgb_id}.shp"
pcr_glob_directory = Path("/data/shared/parameter-sets/pcrglobwb_global")
prepared_PCRGlob_forcing = forcing_path / "UK/work/diagnostic/script"
pcr_model_run = Path("/data/datasets/egu/pcrglob_model_run")
experiment_start_date = "1997-08-01T00:00:00Z"
# 2 end dates: one use for this tutorials loaded forcing and one for the tutorial, which will be 5 time steps, to reduce computation time. The full date will be used later.
# experiment_end_date = "2000-08-31T00:00:00Z"
experiment_end_date = "1997-08-31T00:00:00Z"
Loading a parameter set#
For this example we have prepared and hosted a global parameter set made by Utrecht University. For each model run, what needs to be specified to deliniate the region of interest is a “clone map”. The config file has many options, one of which is the location of this clone map.
Note that this is very specific to PCRGlobWB. For complex (and legacy) models like PCRGlobWB one needs to know quite detailed information about the model before being able to run it. However, using eWaterCycle does reduce the time for seting up the model and getting it to run.
parameter_set = ewatercycle.parameter_sets.ParameterSet(
name="custom_parameter_set",
directory=pcr_glob_directory,
config="./pcrglobwb_uk_05min.ini",
target_model="pcrglobwb",
supported_model_versions={"setters"},
)
print(parameter_set)
ParameterSet( name='custom_parameter_set', directory=PosixPath('/data/shared/parameter-sets/pcrglobwb_global'), config=PosixPath('pcrglobwb_uk_05min.ini'), doi='N/A', target_model='pcrglobwb', supported_model_versions={'setters'}, downloader=None )
Load forcing data#
For this example case, the forcing is generated in this seperate notebook. This is a common practice when generating forcing takes considerable (CPU, memory, disk) resources.
In the cell below, we load the pre-generated forcing. Note that in contrast with HBV, PCRGlobWB only needs temperature and precipitation as forcing inputs. Reminder: HBV also needs potential evaporation. PCRGlobWB calculated potential and actual evaporation as part of its update step.
forcing = ewatercycle.forcing.sources["PCRGlobWBForcing"].load(
directory=prepared_PCRGlob_forcing,
)
print(forcing)
PCRGlobWBForcing( start_time='1997-08-01T00:00:00Z', end_time='2000-08-31T00:00:00Z', directory=PosixPath('/data/datasets/egu/forcing/UK/work/diagnostic/script'), shape=PosixPath('/data/datasets/egu/forcing/UK/work/diagnostic/script/camelsgb_73010.shp'), filenames={}, precipitationNC='pcrglobwb_OBS6_ERA5_reanaly_1_day_pr_1997-2000_camelsgb_73010.nc', temperatureNC='pcrglobwb_OBS6_ERA5_reanaly_1_day_tas_1997-2000_camelsgb_73010.nc' )
Setting up the model#
Note that the model version and the parameterset versions should be compatible.
pcrglob = ewatercycle.models.PCRGlobWB(
parameter_set=parameter_set,
forcing=forcing
)
print(pcrglob)
PCRGlobWB( parameter_set=ParameterSet( name='custom_parameter_set', directory=PosixPath('/data/shared/parameter-sets/pcrglobwb_global'), config=PosixPath('pcrglobwb_uk_05min.ini'), doi='N/A', target_model='pcrglobwb', supported_model_versions={'setters'}, downloader=None ), forcing=PCRGlobWBForcing( start_time='1997-08-01T00:00:00Z', end_time='2000-08-31T00:00:00Z', directory=PosixPath('/data/datasets/egu/forcing/UK/work/diagnostic/script'), shape=PosixPath('/data/datasets/egu/forcing/UK/work/diagnostic/script/camelsgb_73010.shp'), filenames={}, precipitationNC='pcrglobwb_OBS6_ERA5_reanaly_1_day_pr_1997-2000_camelsgb_73010.nc', temperatureNC='pcrglobwb_OBS6_ERA5_reanaly_1_day_tas_1997-2000_camelsgb_73010.nc' ) )
pcrglob.version
'setters'
eWaterCycle exposes a selected set of configurable parameters. These can be modified in the setup()
method.
print(pcrglob.parameters)
dict_items([('start_time', '1997-08-01T00:00:00Z'), ('end_time', '1997-08-01T00:00:00Z'), ('routing_method', 'accuTravelTime'), ('max_spinups_in_years', '0')])
Calling setup()
will start up the model container. Be careful with calling it multiple times!
cfg_file, cfg_dir = pcrglob.setup(
end_time=experiment_end_date,
# end_time="2000-08-31T00:00:00Z", # original end date takes about 22 minutes when alone on server, this is loaded later on, so no need to run it now :)
max_spinups_in_years=0
)
cfg_file, cfg_dir
('/home/mmelotto/projects/book/tutorials_examples/2_HBV_PCRGlobWB_ERA5/pcrglobwb_20250423_120850/pcrglobwb_ewatercycle.ini',
'/home/mmelotto/projects/book/tutorials_examples/2_HBV_PCRGlobWB_ERA5/pcrglobwb_20250423_120850')
print(pcrglob.parameters)
pcrglob_para = pcrglob.parameters
# Convert ISO 8601 strings to datetime objects
start_time = datetime.strptime(experiment_start_date, '%Y-%m-%dT%H:%M:%SZ')
end_time = datetime.strptime(experiment_end_date, '%Y-%m-%dT%H:%M:%SZ')
# Calculate the number of days for the progression bar
delta = end_time - start_time
number_of_days = delta.days
print(f"Number of days to model: {number_of_days}")
dict_items([('start_time', '1997-08-01T00:00:00Z'), ('end_time', '1997-08-31T00:00:00Z'), ('routing_method', 'accuTravelTime'), ('max_spinups_in_years', '0')])
Number of days to model: 30
Note that the parameters have been changed. A new config file which incorporates these updated parameters has been generated as well. If you want to see or modify any additional model settings, you can acces this file directly. When you’re ready, pass the path to the config file to initialize()
.
pcrglob.initialize(cfg_file)
We prepare a small dataframe where we can store the discharge output from the model
time = pd.date_range(pcrglob.start_time_as_isostr, pcrglob.end_time_as_isostr)
timeseries = pd.DataFrame(
index=pd.Index(time, name="time"), columns=["PCRGlobWB: Leven"]
)
timeseries.head()
PCRGlobWB: Leven | |
---|---|
time | |
1997-08-01 00:00:00+00:00 | NaN |
1997-08-02 00:00:00+00:00 | NaN |
1997-08-03 00:00:00+00:00 | NaN |
1997-08-04 00:00:00+00:00 | NaN |
1997-08-05 00:00:00+00:00 | NaN |
Running the model#
Simply running the model from start to end is straightforward. At each time step we can retrieve information from the model.
# Progress bar, since this can take a while
f = IntProgress(min=0, max=number_of_days) # instantiate the bar
display(f) # display the bar
while pcrglob.time < pcrglob.end_time:
pcrglob.update()
# Track discharge at station location
discharge_at_station = pcrglob.get_value_at_coords(
"discharge", lat=[station_latitude], lon=[station_longitude]
)
time = pcrglob.time_as_isostr
timeseries.loc[time, "PCRGlobWB: Leven"] = discharge_at_station[0]
# Update progress bar
f.value += 1
print("Model run finished!")
Model run finished!
Interacting with the model#
PCRGlobWB exposes many variables. Just a few of them are shown here:
list(pcrglob.output_var_names)[-15:-5]
['upper_soil_transpiration',
'snow_water_equivalent',
'total_runoff',
'transpiration_from_irrigation',
'fraction_of_surface_water',
'bottom_elevation_of_uppermost_layer',
'industry_water_withdrawal',
'relativeGroundwaterHead',
'total_fraction_water_allocation',
'groundwater_volume_estimate']
Model fields can be fetched as xarray objects (or as flat numpy arrays using get_value()
):
da = pcrglob.get_value_as_xarray("discharge")
da.thin(5) # only show every 5th value in each dim
<xarray.DataArray (longitude: 29, latitude: 22)> Size: 5kB array([[0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 1.14018524e+00, 1.41760421e+00, 2.15468740e+00, 1.46832180e+00, 1.54563653e+00, 1.04310191e+00, 2.13118052e+00, 2.94179392e+00, 1.93340278e+00, 6.04768515e-01, 5.11597216e-01], [0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 6.03143501e+00, 0.00000000e+00, 1.69181716e+00, 2.49883103e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 2.37508106e+00, 6.69850111e+00, 4.09394693e+00, 4.96979177e-01, 1.71060002e+00, 1.98450994e+00, 2.88379312e+00], [0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 7.76562512e-01, 2.24806595e+01, 9.51388896e-01, 6.60960674e-01, 4.21274605e+01, 8.23425949e-01], [0.00000000e+00, 0.00000000e+00, 2.04119205e+00, 0.00000000e+00, 0.00000000e+00, 2.89075232e+00, 0.00000000e+00, 0.00000000e+00, ... 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00], [0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00], [0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00], [0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 3.02674103e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00]]) Coordinates: * latitude (latitude) float64 176B -5.958 -5.542 -5.125 ... 2.375 2.792 * longitude (longitude) float64 232B 49.04 49.46 49.88 ... 59.88 60.29 60.71
Xarray makes it very easy to plot the data. In the figure below, we add a cross at the location where we collected the discharge every timestep: Leven at Newby bridge.
fig = plt.figure(dpi=120)
ax = fig.add_subplot(111, projection=crs.PlateCarree())
da.plot(ax=ax, cmap="GnBu")
# Overlay ocean and coastlines
ax.add_feature(cfeature.OCEAN)
ax.add_feature(cfeature.RIVERS, color="k")
ax.coastlines()
# Add a red cross marker at the location of the Leven River at Newby Bridge
ax.scatter(station_longitude, station_latitude, s=250, c="r", marker="x", lw=2)
<matplotlib.collections.PathCollection at 0x7fdbe8777500>

We can get (or set) the values at custom points as well:
# Extra
timeseries.plot()
plt.ylabel("Discharge $[m^3/s]$")
Text(0, 0.5, 'Discharge $[m^3/s]$')

Switching back to the loaded results#
We of course want to compare this both to observations as well as to the result of the HBV model.
# Switching back
experiment_start_date = "1997-08-01T00:00:00Z"
experiment_end_date = "2000-08-31T00:00:00Z"
# Read the saved data and convert the time
timeseries = pd.read_csv(pcr_model_run / "PCRGlob_complete_model_run.csv")
timeseries["time"] = pd.to_datetime(timeseries["time"])
timeseries = timeseries.set_index("time")
camelsgb_forcing = ewatercycle.forcing.sources['CaravanForcing'].load(directory=prepared_forcing_path_caravan_central)
xr_camelsgb_forcing = xr.open_dataset(camelsgb_forcing['Q'])
xr_hbv_model_output = xr.open_dataset('~/river_discharge_data.nc')
# flux_out_Q unit conversion factor from mm/day to m3/s
conversion_mmday2m3s = 1 / (1000 * 86400)
shape = fiona.open(camelsgb_forcing.shape)
poly = [shapely.geometry.shape(p["geometry"]) for p in shape][0]
geod = Geod(ellps="WGS84")
poly_area, poly_perimeter = geod.geometry_area_perimeter(poly)
catchment_area_m2 = abs(poly_area)
print(f"{catchment_area_m2 = }")
xr_camelsgb_forcing["Q"] = xr_camelsgb_forcing["Q"] * conversion_mmday2m3s * catchment_area_m2
xr_hbv_model_output['Modelled_discharge'] = xr_hbv_model_output['Modelled_discharge']* conversion_mmday2m3s * catchment_area_m2
catchment_area_m2 = 248130958.76862004
timeseries.plot()
xr_camelsgb_forcing["Q"].plot(label="Observed discharge")
xr_hbv_model_output['Modelled_discharge'].plot(label="modelled discharge HBV")
plt.ylabel("Discharge [$m^3/s$]")
plt.xlabel("Date")
plt.legend()
<matplotlib.legend.Legend at 0x7fdae54fea80>

Doesn’t look too good for PCRGlobWB. This is because for this small area we are only looking at 10-ish pixels and most likely there is a big mismatch between the pixels that drain through the outlet in PCRGlobWB and the actual catchment. Typical areas for PCRGlobWB are more than 100x100 km.
Cleaning up#
Models usually perform some “wrap up tasks” at the end of a model run, such as writing the last outputs to disk and releasing memory. In the case of eWaterCycle, another important teardown task is destroying the container in which the model was running. This can free up a lot of resources on your system. Therefore it is good practice to always call finalize()
when you’re done with an experiment.
pcrglob.finalize()