Model Profile Files

Model profiles are YAML files stored in the Reshapr/model_profiles/ directory. They describe the structure of the output files of the models that Reshapr can operate on in relation to the features and assumptions built into the Reshapr code.

Here is an example of a model profile file:

description: SalishSeaCast version 201812 NEMO results on storage accessible from salish.
             2015-01-01 to 2019-06-30.

time coord:
  name: time_counter
y coord:
  name: y
x coord:
  name: x

# Chunking scheme used for the netCDF4 files
# Note that coordinate names (keys) are conceptual here.
# They are replaced with actual coordinate names in files in the code;
# e.g. time is replaced by time_counter for dataset loading
chunk size:
  time: 1
  depth: 40
  y: 898
  x: 398

geo ref dataset:
  path: https://salishsea.eos.ubc.ca/erddap/griddap/ubcSSnBathymetryV17-02
  y coord: gridY
  x coord: gridX

extraction time origin: 2015-01-01

results archive:
  path: /results/SalishSea/nowcast-green.201812/
  datasets:
    day:
      auxiliary:
        file pattern: "{ddmmmyy}/SalishSea_1d_{yyyymmdd}_{yyyymmdd}_carp_T.nc"
        depth coord: deptht
      biology:
        file pattern: "{ddmmmyy}/SalishSea_1d_{yyyymmdd}_{yyyymmdd}_ptrc_T.nc"
        depth coord: deptht
      chemistry:
        file pattern: "{ddmmmyy}/SalishSea_1d_{yyyymmdd}_{yyyymmdd}_carp_T.nc"
        depth coord: deptht
      grazing and mortality:
        file pattern: "{ddmmmyy}/SalishSea_1d_{yyyymmdd}_{yyyymmdd}_dia2_T.nc"
        depth coord: deptht
      physics tracers:
        file pattern: "{ddmmmyy}/SalishSea_1d_{yyyymmdd}_{yyyymmdd}_grid_T.nc"
        depth coord: deptht
      u velocity:
        file pattern: "{ddmmmyy}/SalishSea_1d_{yyyymmdd}_{yyyymmdd}_grid_U.nc"
        depth coord: depthu
      v velocity:
        file pattern: "{ddmmmyy}/SalishSea_1d_{yyyymmdd}_{yyyymmdd}_grid_V.nc"
        depth coord: depthv
      vertical turbulence:
        file pattern: "{ddmmmyy}/SalishSea_1d_{yyyymmdd}_{yyyymmdd}_grid_W.nc"
        depth coord: depthw
      w velocity:
        file pattern: "{ddmmmyy}/SalishSea_1d_{yyyymmdd}_{yyyymmdd}_grid_W.nc"
        depth coord: depthw
    hour:
      auxiliary:
        file pattern: "{ddmmmyy}/SalishSea_1h_{yyyymmdd}_{yyyymmdd}_carp_T.nc"
        depth coord: deptht
      biology:
        file pattern: "{ddmmmyy}/SalishSea_1h_{yyyymmdd}_{yyyymmdd}_ptrc_T.nc"
        depth coord: deptht
      chemistry:
        file pattern: "{ddmmmyy}/SalishSea_1h_{yyyymmdd}_{yyyymmdd}_carp_T.nc"
        depth coord: deptht
      physics tracers:
        file pattern: "{ddmmmyy}/SalishSea_1h_{yyyymmdd}_{yyyymmdd}_grid_T.nc"
        depth coord: deptht
      primary production:
        file pattern: "{ddmmmyy}/SalishSea_1h_{yyyymmdd}_{yyyymmdd}_prod_T.nc"
        depth coord: deptht
      u velocity:
        file pattern: "{ddmmmyy}/SalishSea_1h_{yyyymmdd}_{yyyymmdd}_grid_U.nc"
        depth coord: depthu
      v velocity:
        file pattern: "{ddmmmyy}/SalishSea_1h_{yyyymmdd}_{yyyymmdd}_grid_V.nc"
        depth coord: depthv
      vertical turbulence:
        file pattern: "{ddmmmyy}/SalishSea_1h_{yyyymmdd}_{yyyymmdd}_grid_W.nc"
        depth coord: depthw
      w velocity:
        file pattern: "{ddmmmyy}/SalishSea_1h_{yyyymmdd}_{yyyymmdd}_grid_W.nc"
        depth coord: depthw

File Structure

Model profile files are nested collections of key-value pairs with the values often being another collection of key-value pairs. They map directly to a nested Python dictionary in the code. Because of that mapping the key-value pairs are also referred to as items.

The sections below describe the model profile items and collections of items, referred to as stanzas. The descriptions include conventions used in the keys and values, whether the items are required or option, what the default values for optional items are, etc.

The default values are generally those associated with the SalishSeaCast NEMO model datasets.

description Item (Required)

The description of the model profile.

Example:

description: SalishSeaCast version 202111 NEMO results on storage accessible from salish.
             2007-01-01 onward.

Multi-paragraph descriptions are supported. Please note that all lines of the description are indented.

description: ECCC HRDPS (High Resolution Deterministic Prediction System)
             2.5 km resolution GEMLAM pre-operational model product fields processed
             for use as surface forcing fields for the SalishSeaCast models.

             This model profile is for the SalishSeaCast NEMO forcing files generated
             from the HRDPS model pre-operational period 2011-09-22 to 2014-11-18 product
             fields provided from archives maintained by the ECCC Vancouver office.

             For the HRDPS model product fields from the 2007-01-03 to 2011-09-21
             pre-operational period please use the HRDPS-2.5km-GEMLAM-pre22sep11.yaml
             profile.

             For the HRDPS operational model product fields downloaded from the
             ECCC Datamart servers daily from 2014-09-12 to present please use the
             HRDPS-2.5km-operational.yaml profile.

name Item (Deprecated)

The name of the model profile.

Example:

name: SalishSeaCast.201812

time coord Stanza (Required)

The name of the netCDF time coordinate in the model dataset.

Example:

time coord:
  name: time_counter

y coord Stanza (Required)

A collection of items that define the (required) name of the netCDF y-direction coordinate in the model dataset, and the (optional) units and comment metadata for the y-direction coordinate in datasets produced by Reshapr.

Examples:

  • A y coord stanza that uses default values for the units and comment metadata items:

    y coord:
      name: y
    
  • A y coord stanza that provides values for the units and comment metadata items:

    y coord:
      name: y
      units: metres
      comment: gridY values are distance in metres in the model y-direction from the south-west corner of the grid
    

Stanza items:

name (Required)

The name of the netCDF y-direction coordinate in the model dataset.

units (Optional)

The value for the units item in the metadata of the y-direction coordinate in datasets produced by Reshapr.

The default value when units is omitted is count, the conventional unit for a grid index in netCDF files.

comment (Optional)

The value for the comment item in the metadata of the y-direction coordinate in datasets produced by Reshapr.

The default value when comment is omitted is gridY values are grid indices in the model y-direction.

x coord Stanza (Required)

A collection of items that define the (required) name of the netCDF x-direction coordinate in the model dataset, and the (optional) units and comment metadata for the x-direction coordinate in datasets produced by Reshapr.

Examples:

  • A x coord stanza that uses default values for the units and comment metadata items:

    x coord:
      name: x
    
  • A x coord stanza that provides values for the units and comment metadata items:

    x coord:
      name: x
      units: metres
      comment: gridX values are distance in metres in the model x-direction from the south-west corner of the grid
    

Stanza items:

name (Required)

The name of the netCDF x-direction coordinate in the model dataset.

units (Optional)

The value for the units item in the metadata of the x-direction coordinate in datasets produced by Reshapr.

The default value when units is omitted is count, the conventional unit for a grid index in netCDF files.

comment (Optional)

The value for the comment item in the metadata of the x-direction coordinate in datasets produced by Reshapr.

The default value when comment is omitted is gridX values are grid indices in the model x-direction.

chunk size Stanza (Required)

A collection of items that define the netCDF chunk size parameters for reading dataset files.

Chunk size plays an important and somewhat complicated role in the generation of the dask task graph. Please see the dask docs about chunk size selection and orientation, and the more detailed docs about chunks.

Examples:

  • A chunk size stanza for a dataset that contains fields with a depth coordinate:

    chunk size:
      time: 1
      depth: 40
      y: 898
      x: 398
    
  • A chunk size stanza for a dataset that contains surface fields with no depth coordinate:

    chunk size:
      time: 24
      y: 266
      x: 256
    

Stanza items:

time (Required)

The chunk size of the time coordinate in the model dataset.

depth (Optional)

The chunk size of the depth coordinate in the model dataset.

This item is not required for datasets that contains surface fields with no depth coordinate.

y (Required)

The chunk size of the y-direction coordinate in the model dataset.

x (Required)

The chunk size of the x-direction coordinate in the model dataset.

geo ref dataset Stanza (Required)

A collection of items that define the dataset that provides the geolocation mapping between grid y/x indices and longitude/latitude values for the fields in the dataset.

Examples:

  • geo ref dataset stanza for a dataset whose geolocation data is available from an ERDDAP server dataset

    geo ref dataset:
      path: https://salishsea.eos.ubc.ca/erddap/griddap/ubcSSnBathymetryV17-02
      y coord: gridY
      x coord: gridX
    
  • geo ref dataset stanza for a dataset whose geolocation data is available from a netCDF file. This example also shows how non-default names of the longitude/latitude variables in the geolocation dataset are handled.

    geo ref dataset:
      path: /results/forcing/atmospheric/GEM2.5/gemlam/gemlam_y2007m01d03.nc
      y coord: y
      longitude var: nav_lon
      x coord: x
      latitude var: nav_lat
    

Stanza items:

path (Required)

The ERDDAP server URL or file system path of the dataset that provides the geolocation mapping between grid y/x indices and longitude/latitude values.

y coord (Required)

The name of the netCDF y-direction coordinate in the geolocation dataset.

longitude var (Optional)

The name of the netCDF longitude variable in the geolocation dataset if it is something other than longitude.

x coord (Required)

The name of the netCDF x-direction coordinate in the geolocation dataset.

latitude var (Optional)

The name of the netCDF latitude variable in the geolocation dataset if it is something other than latitude.

extraction time origin Item (Required)

The date to use as the netCDF time origin for datasets extracted from the model dataset. The value is used to calculate the netCDF units attribute of the time coordinate in extracted datasets; e.g. days since 2015-01-01 12:00:00. It is also used to calculate the time_origin and comment attributes of the time coordinate. The value of the time_origin attribute is the data and time parts of the units attribute; e.g. 2015-01-01 12:00:00. The value of the comment attribute is an explanation of the dataset’s time coordinate values; e.g.

time values are UTC at the centre of the intervals over which the
calculated model results are averaged;
e.g. the field average values for 8 February 2022 have
a time value of 2022-02-08 12:00:00Z

The quanta (days, hours) and the time components in the units, time_origin, and comment attributes determined by the value of the time base item in the extract Process Configuration File.

Example:

extraction time origin: 2015-01-01

results archive Stanza (Required)

A nested collection of items that describe the file system paths and file names in which various groups of model variables are stored. The depth coordinate for each of the variable groups is also specified here.

Example:

results archive:
   path: /results/SalishSea/nowcast-green.201812/
   datasets:
      day:
         biology:
            file pattern: "{ddmmmyy}/SalishSea_1d_{yyyymmdd}_{yyyymmdd}_ptrc_T.nc"
            depth coord: deptht
         u velocity:
            file pattern: "{ddmmmyy}/SalishSea_1d_{yyyymmdd}_{yyyymmdd}_grid_U.nc"
            depth coord: depthu
      hour:
         physics tracers:
            file pattern: "{ddmmmyy}/SalishSea_1h_{yyyymmdd}_{yyyymmdd}_grid_T.nc"
            depth coord: deptht
         u velocity:
            file pattern: "{ddmmmyy}/SalishSea_1h_{yyyymmdd}_{yyyymmdd}_grid_U.nc"
            depth coord: depthu

Stanza items:

path (Required)

The absolute file system path of the directory tree in which the model dataset files are stored.

datasets (Required)

The sub-stanza key for the collections of time base (day, hour) collections of model datasets.

<time base> (Required)

The sub-stanza key for the time base groups of model dataset variable groups. Must be one of day or hour or month. The time base keys are used as the value for the time base item in the extract Process Configuration File as part of the specification of which dataset to extract variables from.

days per file (Optional)

An integer or string that specifies the number of days of model results that are stored in each file of the dataset. At present the only accepted values are 1 and month. Example:

results archive:
  path: /results2/SalishSea/month-avg.202111/
  datasets:
    month:
      days per file: month
      biology:
        file pattern: "SalishSeaCast_1m_biol_T_{yyyymm01}_{yyyymm_end}.nc"
        depth coord: depth

If days per file is not provided, its value defaults to 1.

<variables group> (Required)

The sub-stanza key(s) for the collections of model variables in particular dataset files. The variable group keys are used as the value for the variable group item in the extract Process Configuration File as part of the specification of which dataset to extract variables from.

file pattern (Required)

The dataset path/file pattern for the model variables in a group. The file patterns are relative to the model dataset path described above. Elements of the pattern in brace brackets; e.g. {yyyymmdd} are replaced by dates in the format indicated. For example, for the date 2022-05-27 these are some of the date format pattern elements and the resulting formatted date strings:

  • ddmmmyy; e.g. 27may22

  • yyyymmdd; e.g. 20220527

  • yyyy; e.g. 2022

  • nemo_yyyymmdd; e.g. y2022m05d27

  • nemo_yyyymm; e.g. y2022m05

The supported date format pattern elements are the names of the Date Formatters functions.

depth coord (Required for all but purely surface datasets)

The name of the netCDF depth coordinate in the variables group dataset.

For datasets that contain only surface fields (i.e. none of the variables have a depth coordinate) the depth coord item is omitted.

Example:

results archive:
  path: /results/forcing/atmospheric/GEM2.5/operational/
  datasets:
    hour:
      surface fields:
        file pattern: "ops_{nemo_yyyymmdd}.nc"

The depth coord item is required for datasets that contain a mixture of surface and depth-varying variables.