import matplotlib
if not hasattr(matplotlib.RcParams, "_get"):
    matplotlib.RcParams._get = dict.get

GRDC Discharge Observations#

The Global Runoff Data Centre (GRDC) is the primary source for historical daily and monthly river discharge data worldwide, covering thousands of gauging stations.

Unlike the Caravan or ERA5 data, GRDC data usually requires a manual download step. eWaterCycle has some stations downloaded already, if not: — you register on the GRDC portal, select the stations you need, and download the data files to your own machine. Once you have the files, send them to an admin and they will make sure eWaterCycle makes loading them straightforward.

What we need:

  1. A GRDC station ID (found via the GRDC station catalogue)

  2. A time window

# General python
import warnings
warnings.filterwarnings("ignore", category=UserWarning)

import pandas as pd
import matplotlib.pyplot as plt
from pathlib import Path

# Niceties
from rich import print

# eWaterCycle observation module
from ewatercycle.observation.grdc import get_grdc_data

Downloading GRDC data#

Normally one would download the GRDC data. We have some GRDC stations downloaded ourselves, if it is not in our database, please ask the admin to add your station. Or you could do it yourself:

  1. Register (free) at portal.grdc.bafg.de

  2. Search for your station by name, river, or country

  3. Select the station and download the daily data as a .txt file

  4. Place the downloaded file(s) in a directory on your machine — that path goes into grdc_data_home below

The GRDC station ID is a 7-digit number visible in the portal and in the filename of the downloaded file (e.g. GRDC_6335020_Q_Day.txt).

# Rhine at Lobith — one of the most monitored river cross-sections in Europe
station_id = "6335020"

experiment_start_date = "2000-01-01T00:00:00Z"
experiment_end_date   = "2005-12-31T00:00:00Z"

Loading the observations#

get_grdc_data reads the downloaded GRDC file and returns an xarray dataset of daily discharge values also containing station information (name, river, coordinates, drainage area, etc.).

observations_ds = get_grdc_data(
    station_id=station_id,
    start_time=experiment_start_date,
    end_time=experiment_end_date,
)

The station metadata (name, river, country, coordinates) is embedded as scalar variables in the dataset. The discharge timeseries is in streamflow, with -999 used as a missing value — we mask those before plotting.

print(observations_ds)

Plotting the discharge#

The observations are a pandas Series with a DatetimeIndex, so they can be plotted directly.

station_name = str(observations_ds["station_name"].values)
river_name   = str(observations_ds["river_name"].values)

# Mask the -999 missing value sentinel before plotting
streamflow = observations_ds["streamflow"].where(observations_ds
                                                 ["streamflow"] != -999)

plt.figure(figsize=(12, 4))
streamflow.plot(color="steelblue")
plt.title(f"GRDC station {station_id}{station_name} ({river_name})")
plt.xlabel("Date")
plt.ylabel("Discharge (m³/s)")
plt.tight_layout()