Climate scenarios#

5.1 Description#

The model will be run for the future under various Shared Socioeconomic Pathway Scenarios (SSP’s). These scenarios represent different possible future outcomes depending on the increase in trapped radiation in the atmosphere. CMIP6 provides 5 scenarios ranging from SSP119 to SSP585. These outer extreme scenarios were not included in this study because recent emission trends and climate policy indicate these scenarios have become less plausible (van Vuuren et al., 2026). For a description of the scenarios used in this study see table 3.

Table 5.1: Table 3: description of various CMIP6 SSP’s from 2025-2099 (Government of Canada, 2019).

Scenarios

Description

Global \(\Delta\)T\(^{\circ}\)C

Period

SSP126

“Middle of the road”: Trends do not shift from historical patterns

1,8

2025 - 2099

SSP245

“A rocky road”: Low international priority in addressing environmental concerns. High challenges to mitigation and adaptation, resurgent nationalism, slow economic growth.

2,7

2025 - 2099

SSP370

“A road divided”: Large gap between high and low tech economies, widening global disparities, diverse investment in fossil fuels and renewable energy sources.

3,7

2025 - 2099

5.2 CMIP6 low flow comparison/validation#

To see if the CMIP6 dataset produces realistic model outcomes, historical low flow data will be compared to observed and ERA5 low flow data over a 25 year period from 1989-2014. It should be noted that CMIP6 is not a reanalysis model like ERA5 but a climate model, meaning it is not expected to reproduce the timing of peaks or wet and dry years precisely. Instead, model outcomes are expected to statistically match observed and ERA5 outcomes over a longer time period. Figure 11 shows the cumulative distribution of low flow days for this 25 year period.

To match CMIP6 data to ERA5, quantile delta mapping may be applied. This is a correction method to align the distribution of climate model outputs with observational data and thus minimising distributional biases. This has been applied on forcing inputs using the adjust function from python module cmethods and discharge outputs using interp1d from scipy.interpolate. However, both did not yield better results and were therefore not used.

Startup & Imports#

# General python
import warnings
warnings.filterwarnings("ignore", category=UserWarning)

import numpy as np
from pathlib import Path
import pandas as pd
import matplotlib.pyplot as plt
import xarray as xr
from scipy.interpolate import interp1d

# Niceties
from rich import print
# General eWaterCycle
import ewatercycle
import ewatercycle.models
import ewatercycle.forcing
# Defining things

basin_size = 132572
q_critical = 500
# Choosing time period

start_year = 1995
end_year = 2014

validation_start = f"{start_year}-01-01T00:00:00Z"
validation_end = f"{end_year}-12-31T00:00:00Z"

# Choose CMIP Scenario: 
Scenario = "historical"    # Options: historical, ssp119, ssp126, ssp245, ssp,370, ssp585

# Attach colours to scenarios to make plotting easier 
colours = {"historical": "tab:blue", "ssp126": "green", "ssp245": "orange", "ssp370": "red"}
# # Create pathways for forcings

forcing_path_ERA5 = Path.home() / "BEP-maxime" / "book" / "thesis_projects" / "BSc" / "2026_Q4_MaximedeBekker_CEG" /  "Workyard" / "forcings" / "ERA5" / f"ERA5-{start_year}-{end_year}"

forcing_path_CMIP = Path.home() / "BEP-maxime" / "book" / "thesis_projects" / "BSc" / "2026_Q4_MaximedeBekker_CEG" / "Workyard" / "forcings" / "CMIP6" / Scenario / f"CMIP6-{start_year}-{end_year}"

forcing_path_QDM = Path.home() / "BEP-maxime" / "book" / "thesis_projects" / "BSc" / "2026_Q4_MaximedeBekker_CEG" / "Workyard" / "forcings" / "CMIP6" / Scenario / f"CMIP6-QDM-{start_year}-{end_year}"

discharge_file = Path.home() / "BEP-maxime" / "book" / "thesis_projects" / "BSc" / "2026_Q4_MaximedeBekker_CEG" /  "Workyard" / "07DA001_discharge_daily_withoutmissing.csv"

shape_file = Path.home() / "BEP-maxime" / "book" / "thesis_projects" / "BSc" / "2026_Q4_MaximedeBekker_CEG" /  "Workyard" / "Shapefiles" / "07DA001_basin.shp"

hbv_config = Path.home() / "BEP-maxime" / "book" / "thesis_projects" / "BSc" / "2026_Q4_MaximedeBekker_CEG" /  "Workyard" / "hbv_config"
hbv_config.mkdir(parents=True, exist_ok=True)
# Load CSV discharge 07DA001

q_obs = pd.read_csv(discharge_file, skiprows=1)
q_obs = q_obs[["Date", "Value"]].copy()
q_obs["Date"] = pd.to_datetime(q_obs["Date"])
q_obs = q_obs.rename(columns={"Value": "discharge_m3s"})
# Define time period
validation_start_date = pd.to_datetime(validation_start.replace("Z", ""))
validation_end_date = pd.to_datetime(validation_end.replace("Z", ""))

# Skip 1 year for filling storages
evaluation_start = pd.to_datetime(f"{validation_start_date.year + 1}-01-01")

# Align q_obs to relevant dates
q_obs = q_obs[(q_obs["Date"] >= validation_start_date) & (q_obs["Date"] <= validation_end_date)]
observed_output = pd.Series(data=q_obs["discharge_m3s"].to_numpy(), name="Observed discharge", index=q_obs["Date"])
# Load data
ERA5_forcing = ewatercycle.forcing.sources["LumpedMakkinkForcing"].load(directory=forcing_path_ERA5)

CMIP_forcing = ewatercycle.forcing.sources["LumpedMakkinkForcing"].load(directory=forcing_path_CMIP)
# Load calibration constants

par_ensemble = [
    [6.279135, 0.4808243, 174.127749, 1.9527195, 0.3305087, 6.19919, 0.0768362, 0.004366398, 0.4076606],
    [7.35776, 0.432509, 192.67085, 1.66088, 0.289296, 5.323766, 0.037268, 0.004399, 1.146504],
    [7.9355, 0.4593, 219.6962, 1.72624, 0.26391, 5.810765, 0.04804, 0.0155065, 0.76857],
    [5.5464, 0.46496, 187.8548, 1.82803, 0.440628, 6.29496, 0.062766, 0.033095, 0.80392],
   [7.23868, 0.47495, 181.82012, 1.8232, 0.4884032, 5.546412, 0.0449439, 0.00231717, 1.25052]]


par_names = ["Imax",         # Maximum interception storage
               "Ce",         # Evaporation correction factor
               "Sumax",      # Maximum soil moisture storage
               "Beta",       # Soil runoff parameter
               "Pmax",       # Maximum percolation rate
               "Tlag",       # Time lag
               "Kf",         # Fast reservoir recession coefficient
               "Ks",         # Slow reservoir recession coefficient
               "FM"]          # Snowmelt factor
# Storages

#               Si,  Su, Sf, Ss, Sp
s_0 = np.array([0,  100,  0,  5,  0])

Model setup#

def run_hbv(parameters, initial_storages, forcing):

    # Creating model object
    model = ewatercycle.models.HBV(forcing=forcing)

    # Creating config file
    config_file, _ = model.setup(
        parameters=parameters,
        initial_storages=initial_storages,
        cfg_dir=hbv_config)

    # Initialising model
    model.initialize(config_file)

    # Define & update outputs
    Q_m = []
    time = []

    while model.time < model.end_time:
        model.update()
        Q_m.append(model.get_value("Q")[0])
        time.append(pd.Timestamp(model.time_as_datetime))

    model.finalize()

    # Convert mm/day to m3/s
    model_output_mmday = pd.Series(
        data=Q_m,
        index=time,
        name="Modelled discharge")

    model_output_m3s = model_output_mmday * basin_size * 1000 / 86400

    return model_output_m3s

Running model#

def run_hbv_ensemble(par_ensemble, initial_storages, forcing):

    # Define amount of parameter sets
    N = len(par_ensemble)
    
    # Create dataframe to append data to & add column for observed data
    ensemble_data = pd.DataFrame()

    for i in range(N):

        # Run HBV model for the parameter sets 
        simulated = run_hbv(
            parameters=par_ensemble[i],
            initial_storages=initial_storages,
            forcing=forcing)

        # Filter data by day only, not by day & time to prevent alignment issues
        simulated_daily = simulated

        simulated_daily.index = pd.to_datetime(simulated_daily.index).tz_localize(None).normalize()
        simulated_daily.name = f"Set {i+1}"
         
        # Append new column for every parameter set results
        ensemble_data[f"Set {i+1}"] = simulated

    # Filter observed data by day
    observed_daily = observed_output
    observed_daily.index = pd.to_datetime(observed_daily.index).tz_localize(None).normalize()

    # Add mean of all sets
    ensemble_data["Mean"] = ensemble_data.mean(axis=1)
    ensemble_data['Observed discharge'] = observed_daily

    return ensemble_data
ensemble_data_ERA5 = run_hbv_ensemble(
    par_ensemble=par_ensemble,
    initial_storages=s_0,
    forcing=ERA5_forcing)

ensemble_data_CMIP = run_hbv_ensemble(
    par_ensemble=par_ensemble,
    initial_storages=s_0,
    forcing=CMIP_forcing)

Lowflow calculation#

def lowflow_counter_ensemble(ensemble_data, start_date, end_date):
    
    lowflow_days = []

    years = list(range(start_date.year, end_date.year + 1))

    for i in range(len(years)):
    
        year = years[i]
    
        # Define start and end month-day
        year_start = pd.to_datetime(f"{year}-05-18")
        year_end = pd.to_datetime(f"{year}-10-17")
    
        year_data = ensemble_data[
            (ensemble_data.index >= year_start) &
            (ensemble_data.index <= year_end)]

        # Set zeros to count from
        observed_lowflow_days = 0
        modelled_lowflow_days = []
        modelmean_lowflow_days = 0

        # Count observed lowflow days
        for j in range(len(year_data)):
            observed_q = year_data.iloc[j]["Observed discharge"]

            if observed_q < q_critical:
                observed_lowflow_days += 1

        # Count model mean lowflow days
        for j in range(len(year_data)):
            modelmean_q = year_data.iloc[j]["Mean"]

            if modelmean_q < q_critical:
                modelmean_lowflow_days += 1

        # Count modelled lowflow days
        for j in range(len(par_ensemble)):
            set_lowflow_days = 0

            for k in range(len(year_data)):
                set_q = year_data.iloc[k][f"Set {j+1}"]

                if set_q < q_critical:
                    set_lowflow_days += 1

            modelled_lowflow_days.append(set_lowflow_days)

        setavg_lowflow_days = np.mean(modelled_lowflow_days)
        
        lowflow_days.append({
            "year": year,
            "observed": observed_lowflow_days,
            "modelmean": modelmean_lowflow_days,
            "set_1": modelled_lowflow_days[0],
            "set_2": modelled_lowflow_days[1],
            "set_3": modelled_lowflow_days[2],
            "set_4": modelled_lowflow_days[3],
            "set_5": modelled_lowflow_days[4],
            "set_avg": np.round(setavg_lowflow_days)})

    lowflow_days = pd.DataFrame(lowflow_days)

    days_sum = lowflow_days[["observed", "modelmean", "set_1", "set_2", "set_3", "set_4", "set_5", "set_avg"]].sum()

    return lowflow_days
lowflow_ERA5 = lowflow_counter_ensemble(
    ensemble_data=ensemble_data_ERA5,
    start_date=evaluation_start,
    end_date=validation_end_date)

lowflow_CMIP = lowflow_counter_ensemble(
    ensemble_data=ensemble_data_CMIP,
    start_date=evaluation_start,
    end_date=validation_end_date)

CDF Calculation#

def rank(lowflow_ERA5, lowflow_CMIP):

    # Combine tables
    rank_table = pd.DataFrame({
        "Observed": np.sort(lowflow_ERA5["observed"]),
        "ERA5": np.sort(lowflow_ERA5["set_avg"]),
        "CMIP6": np.sort(lowflow_CMIP["set_avg"]),})
    
    rank_table.insert(0, "rank", range(1, len(rank_table) + 1))

    return rank_table
rank_table = rank(lowflow_ERA5, lowflow_CMIP)
def plot_cumsum(rank_table):

    plt.figure(figsize=(8, 5))

    # Plot cumsum for all columns, skipping rank column
    for i in range(1, len(rank_table.columns)):
        plt.plot(rank_table["rank"], rank_table[rank_table.columns[i]].cumsum(), marker="o", label=rank_table.columns[i])

    # Plotting
    plt.xlabel("Ranked years")
    plt.ylabel("Cumulative low flow days")
    plt.xticks(rank_table["rank"])
    plt.title("Cumulative annual low flow days")
    # plt.grid(True)
    plt.legend()
    plt.show()
plot_cumsum(rank_table)
../../../../_images/4f10abaaae3e792a3c350d0b7d12348b8fcf20e9fea783e8b3b6dd1c3473fd87.png

Figure 11: Cumulative distribution of low flow days between 1989-2014 for historical, ERA5 and CMIP6 data.

Figure 11 shows a slight deviation in CMIP6 low flow days compared to the observed and ERA5 curves, particularly in the lower and higher ranked years. This suggests that CMIP6 may slightly overestimate extremes i.e. driest years appear drier and wettest years appear wetter. Despite this, CMIP6 follows the general curve trend reasonably well and precisely reproduces the total number of low flow days. The observed data shows 532 total low flow days over this period, ERA5 537 days and CMIP6 532 days. Future CMIP6 forcings are therefore considered suitable for analysing trends in low flow days over long periods although extreme years should be interpreted with caution.