Model Profile Files
Model profiles are YAML files stored in the Reshapr/model_profiles/
directory.
They describe the structure of the output files of the models that Reshapr
can operate on in relation to the features and assumptions built into the Reshapr
code.
Here is an example of a model profile file:
description: SalishSeaCast version 201812 NEMO results on storage accessible from salish.
2015-01-01 to 2019-06-30.
time coord:
name: time_counter
y coord:
name: y
x coord:
name: x
# Chunking scheme used for the netCDF4 files
# Note that coordinate names (keys) are conceptual here.
# They are replaced with actual coordinate names in files in the code;
# e.g. time is replaced by time_counter for dataset loading
chunk size:
time: 1
depth: 40
y: 898
x: 398
geo ref dataset:
path: https://salishsea.eos.ubc.ca/erddap/griddap/ubcSSnBathymetryV17-02
y coord: gridY
x coord: gridX
extraction time origin: 2015-01-01
results archive:
path: /results/SalishSea/nowcast-green.201812/
datasets:
day:
auxiliary:
file pattern: "{ddmmmyy}/SalishSea_1d_{yyyymmdd}_{yyyymmdd}_carp_T.nc"
depth coord: deptht
biology:
file pattern: "{ddmmmyy}/SalishSea_1d_{yyyymmdd}_{yyyymmdd}_ptrc_T.nc"
depth coord: deptht
chemistry:
file pattern: "{ddmmmyy}/SalishSea_1d_{yyyymmdd}_{yyyymmdd}_carp_T.nc"
depth coord: deptht
grazing and mortality:
file pattern: "{ddmmmyy}/SalishSea_1d_{yyyymmdd}_{yyyymmdd}_dia2_T.nc"
depth coord: deptht
physics tracers:
file pattern: "{ddmmmyy}/SalishSea_1d_{yyyymmdd}_{yyyymmdd}_grid_T.nc"
depth coord: deptht
u velocity:
file pattern: "{ddmmmyy}/SalishSea_1d_{yyyymmdd}_{yyyymmdd}_grid_U.nc"
depth coord: depthu
v velocity:
file pattern: "{ddmmmyy}/SalishSea_1d_{yyyymmdd}_{yyyymmdd}_grid_V.nc"
depth coord: depthv
vertical turbulence:
file pattern: "{ddmmmyy}/SalishSea_1d_{yyyymmdd}_{yyyymmdd}_grid_W.nc"
depth coord: depthw
w velocity:
file pattern: "{ddmmmyy}/SalishSea_1d_{yyyymmdd}_{yyyymmdd}_grid_W.nc"
depth coord: depthw
hour:
auxiliary:
file pattern: "{ddmmmyy}/SalishSea_1h_{yyyymmdd}_{yyyymmdd}_carp_T.nc"
depth coord: deptht
biology:
file pattern: "{ddmmmyy}/SalishSea_1h_{yyyymmdd}_{yyyymmdd}_ptrc_T.nc"
depth coord: deptht
chemistry:
file pattern: "{ddmmmyy}/SalishSea_1h_{yyyymmdd}_{yyyymmdd}_carp_T.nc"
depth coord: deptht
physics tracers:
file pattern: "{ddmmmyy}/SalishSea_1h_{yyyymmdd}_{yyyymmdd}_grid_T.nc"
depth coord: deptht
primary production:
file pattern: "{ddmmmyy}/SalishSea_1h_{yyyymmdd}_{yyyymmdd}_prod_T.nc"
depth coord: deptht
u velocity:
file pattern: "{ddmmmyy}/SalishSea_1h_{yyyymmdd}_{yyyymmdd}_grid_U.nc"
depth coord: depthu
v velocity:
file pattern: "{ddmmmyy}/SalishSea_1h_{yyyymmdd}_{yyyymmdd}_grid_V.nc"
depth coord: depthv
vertical turbulence:
file pattern: "{ddmmmyy}/SalishSea_1h_{yyyymmdd}_{yyyymmdd}_grid_W.nc"
depth coord: depthw
w velocity:
file pattern: "{ddmmmyy}/SalishSea_1h_{yyyymmdd}_{yyyymmdd}_grid_W.nc"
depth coord: depthw
File Structure
Model profile files are nested collections of key-value pairs with the values often being another collection of key-value pairs. They map directly to a nested Python dictionary in the code. Because of that mapping the key-value pairs are also referred to as items.
The sections below describe the model profile items and collections of items, referred to as stanzas. The descriptions include conventions used in the keys and values, whether the items are required or option, what the default values for optional items are, etc.
The default values are generally those associated with the SalishSeaCast NEMO model datasets.
description
Item (Required)
The description of the model profile.
Example:
description: SalishSeaCast version 202111 NEMO results on storage accessible from salish.
2007-01-01 onward.
Multi-paragraph descriptions are supported. Please note that all lines of the description are indented.
description: ECCC HRDPS (High Resolution Deterministic Prediction System)
2.5 km resolution GEMLAM pre-operational model product fields processed
for use as surface forcing fields for the SalishSeaCast models.
This model profile is for the SalishSeaCast NEMO forcing files generated
from the HRDPS model pre-operational period 2011-09-22 to 2014-11-18 product
fields provided from archives maintained by the ECCC Vancouver office.
For the HRDPS model product fields from the 2007-01-03 to 2011-09-21
pre-operational period please use the HRDPS-2.5km-GEMLAM-pre22sep11.yaml
profile.
For the HRDPS operational model product fields downloaded from the
ECCC Datamart servers daily from 2014-09-12 to present please use the
HRDPS-2.5km-operational.yaml profile.
name
Item (Deprecated)
The name of the model profile.
Example:
name: SalishSeaCast.201812
time coord
Stanza (Required)
The name of the netCDF time coordinate in the model dataset.
Example:
time coord:
name: time_counter
y coord
Stanza (Required)
A collection of items that define the
(required)
name of the netCDF y-direction coordinate in the model dataset,
and the
(optional)
units
and comment
metadata for the y-direction coordinate in datasets
produced by Reshapr
.
Examples:
A
y coord
stanza that uses default values for theunits
andcomment
metadata items:y coord: name: y
A
y coord
stanza that provides values for theunits
andcomment
metadata items:y coord: name: y units: metres comment: gridY values are distance in metres in the model y-direction from the south-west corner of the grid
Stanza items:
name
(Required)The name of the netCDF y-direction coordinate in the model dataset.
units
(Optional)The value for the
units
item in the metadata of the y-direction coordinate in datasets produced byReshapr
.The default value when
units
is omitted iscount
, the conventional unit for a grid index in netCDF files.comment
(Optional)The value for the
comment
item in the metadata of the y-direction coordinate in datasets produced byReshapr
.The default value when
comment
is omitted isgridY values are grid indices in the model y-direction
.
x coord
Stanza (Required)
A collection of items that define the
(required)
name of the netCDF x-direction coordinate in the model dataset,
and the
(optional)
units
and comment
metadata for the x-direction coordinate in datasets
produced by Reshapr
.
Examples:
A
x coord
stanza that uses default values for theunits
andcomment
metadata items:x coord: name: x
A
x coord
stanza that provides values for theunits
andcomment
metadata items:x coord: name: x units: metres comment: gridX values are distance in metres in the model x-direction from the south-west corner of the grid
Stanza items:
name
(Required)The name of the netCDF x-direction coordinate in the model dataset.
units
(Optional)The value for the
units
item in the metadata of the x-direction coordinate in datasets produced byReshapr
.The default value when
units
is omitted iscount
, the conventional unit for a grid index in netCDF files.comment
(Optional)The value for the
comment
item in the metadata of the x-direction coordinate in datasets produced byReshapr
.The default value when
comment
is omitted isgridX values are grid indices in the model x-direction
.
chunk size
Stanza (Required)
A collection of items that define the netCDF chunk size parameters for reading dataset files.
Chunk size plays an important and somewhat complicated role in the generation of the dask task graph. Please see the dask docs about chunk size selection and orientation, and the more detailed docs about chunks.
Examples:
A
chunk size
stanza for a dataset that contains fields with a depth coordinate:chunk size: time: 1 depth: 40 y: 898 x: 398
A
chunk size
stanza for a dataset that contains surface fields with no depth coordinate:chunk size: time: 24 y: 266 x: 256
Stanza items:
time
(Required)The chunk size of the time coordinate in the model dataset.
depth
(Optional)The chunk size of the depth coordinate in the model dataset.
This item is not required for datasets that contains surface fields with no depth coordinate.
y
(Required)The chunk size of the y-direction coordinate in the model dataset.
x
(Required)The chunk size of the x-direction coordinate in the model dataset.
geo ref dataset
Stanza (Required)
A collection of items that define the dataset that provides the geolocation mapping between grid y/x indices and longitude/latitude values for the fields in the dataset.
Examples:
geo ref dataset
stanza for a dataset whose geolocation data is available from an ERDDAP server datasetgeo ref dataset: path: https://salishsea.eos.ubc.ca/erddap/griddap/ubcSSnBathymetryV17-02 y coord: gridY x coord: gridX
geo ref dataset
stanza for a dataset whose geolocation data is available from a netCDF file. This example also shows how non-default names of the longitude/latitude variables in the geolocation dataset are handled.geo ref dataset: path: /results/forcing/atmospheric/GEM2.5/gemlam/gemlam_y2007m01d03.nc y coord: y longitude var: nav_lon x coord: x latitude var: nav_lat
Stanza items:
path
(Required)The ERDDAP server URL or file system path of the dataset that provides the geolocation mapping between grid y/x indices and longitude/latitude values.
y coord
(Required)The name of the netCDF y-direction coordinate in the geolocation dataset.
longitude var
(Optional)The name of the netCDF longitude variable in the geolocation dataset if it is something other than
longitude
.x coord
(Required)The name of the netCDF x-direction coordinate in the geolocation dataset.
latitude var
(Optional)The name of the netCDF latitude variable in the geolocation dataset if it is something other than
latitude
.
extraction time origin
Item (Required)
The date to use as the netCDF time origin for datasets extracted from the model
dataset.
The value is used to calculate the netCDF units
attribute of the time coordinate
in extracted datasets;
e.g. days since 2015-01-01 12:00:00
.
It is also used to calculate the time_origin
and comment
attributes of the
time coordinate.
The value of the time_origin
attribute is the data and time parts of the units
attribute;
e.g. 2015-01-01 12:00:00
.
The value of the comment
attribute is an explanation of the dataset’s time
coordinate values;
e.g.
time values are UTC at the centre of the intervals over which the
calculated model results are averaged;
e.g. the field average values for 8 February 2022 have
a time value of 2022-02-08 12:00:00Z
The quanta
(days
, hours
)
and the time components in the units
,
time_origin
,
and comment
attributes determined by the value of the time base
item
in the extract Process Configuration File.
Example:
extraction time origin: 2015-01-01
results archive
Stanza (Required)
A nested collection of items that describe the file system paths and file names in which various groups of model variables are stored. The depth coordinate for each of the variable groups is also specified here.
Example:
results archive:
path: /results/SalishSea/nowcast-green.201812/
datasets:
day:
biology:
file pattern: "{ddmmmyy}/SalishSea_1d_{yyyymmdd}_{yyyymmdd}_ptrc_T.nc"
depth coord: deptht
u velocity:
file pattern: "{ddmmmyy}/SalishSea_1d_{yyyymmdd}_{yyyymmdd}_grid_U.nc"
depth coord: depthu
hour:
physics tracers:
file pattern: "{ddmmmyy}/SalishSea_1h_{yyyymmdd}_{yyyymmdd}_grid_T.nc"
depth coord: deptht
u velocity:
file pattern: "{ddmmmyy}/SalishSea_1h_{yyyymmdd}_{yyyymmdd}_grid_U.nc"
depth coord: depthu
Stanza items:
path
(Required)The absolute file system path of the directory tree in which the model dataset files are stored.
datasets
(Required)The sub-stanza key for the collections of time base (
day
,hour
) collections of model datasets.<time base>
(Required)The sub-stanza key for the time base groups of model dataset variable groups. Must be one of
day
orhour
ormonth
. The time base keys are used as the value for thetime base
item in the extract Process Configuration File as part of the specification of which dataset to extract variables from.days per file
(Optional)An integer or string that specifies the number of days of model results that are stored in each file of the dataset. At present the only accepted values are
1
andmonth
. Example:results archive: path: /results2/SalishSea/month-avg.202111/ datasets: month: days per file: month biology: file pattern: "SalishSeaCast_1m_biol_T_{yyyymm01}_{yyyymm_end}.nc" depth coord: depth
If
days per file
is not provided, its value defaults to1
.<variables group>
(Required)The sub-stanza key(s) for the collections of model variables in particular dataset files. The variable group keys are used as the value for the
variable group
item in the extract Process Configuration File as part of the specification of which dataset to extract variables from.file pattern
(Required)The dataset path/file pattern for the model variables in a group. The file patterns are relative to the model dataset
path
described above. Elements of the pattern in brace brackets; e.g.{yyyymmdd}
are replaced by dates in the format indicated. For example, for the date2022-05-27
these are some of the date format pattern elements and the resulting formatted date strings:ddmmmyy
; e.g.27may22
yyyymmdd
; e.g.20220527
yyyy
; e.g.2022
nemo_yyyymmdd
; e.g.y2022m05d27
nemo_yyyymm
; e.g.y2022m05
The supported date format pattern elements are the names of the Date Formatters functions.
depth coord
(Required for all but purely surface datasets)The name of the netCDF depth coordinate in the variables group dataset.
For datasets that contain only surface fields (i.e. none of the variables have a depth coordinate) the
depth coord
item is omitted.Example:
results archive: path: /results/forcing/atmospheric/GEM2.5/operational/ datasets: hour: surface fields: file pattern: "ops_{nemo_yyyymmdd}.nc"
The
depth coord
item is required for datasets that contain a mixture of surface and depth-varying variables.