Analysis of NEMO Runs with Iona Wastewater Discharge
Susan is running various configurations of version 202111 that include a simulation of the Iona Island Wastewater Treatment Plant Deep Sea Outfall. Since those are “research run results” in contrast to collections of daily results files from long-running hindcasts the handling of the results files and the Reshapr model profile(s) is a little different.
Note
This section serves as a guide for use of Reshapr for other “research run” applications.
Notable differences include:
The research runs are executed on an HPC cluster in multi-day segments. For the Iona wastewater case the runs were done on
graham
. Initial runs were 5 days long for debugging, tunning, and initial analysis development by Jake. Subsequent runs were 1 month long because that fits well in the 12-hour walltime scheduler partition ongraham
.The run results are downloaded from the HPC cluster to research storage on
/ocean/$USER/
or/data/$USER/
. For the Iona wastewater case the results were downloaded to directory trees in/data/sallen/results/MEOPAR/wastewater/
such as/data/sallen/results/MEOPAR/wastewater/long_run/
.The multi-day run results files like
/data/sallen/results/MEOPAR/wastewater/long_run/SalishSea_1h_20180101_20180131_grid_T.nc
must be split into 1-day files stored in date-named subdirectories like/data/sallen/results/MEOPAR/wastewater/long_run/01jan18/SalishSea_1h_20180101_20180101_grid_T.nc
. At the moment, the beast way to do that is via the SalishSeaCast automationnowcast.workers.split_results
worker. Only Doug and Susan have the necessary permissions to run that worker. Please ask them for help if you need to split results from another research run.The Reshapr model profile is maintained by the user doing the analysis rather than it being included in the Reshapr code repository. Please see the Iona Wastewater Model Profile section below for details.
File Organization and Executing Extractions
Store your model profile and extraction configuration YAML files in a Git repository such as your
analysis repository so that you can commit your changes to them and push them to GitHub to document
your analysis history and make it reproducible.
Here is an example from analysis-doug
:
analysis-doug/
├── ...
├── notebooks
│ ├── ...
│ └── wastewater
│ ├── extract_biology.yaml
│ └── model_profiles
│ └── SalishSeaCast-202111-wastewater-salish.yaml
Store the results of your extractions outside of a Git repository,
for example,
/ocean/dlatorne/MOAD/extractions/
.
Extracted netCDF files are large binary files.
Do not try to push them to GitHub.
If you commit them and push them to GitHub you will quickly exceed file and repository size limits.
They are products of the extraction process described by your model profile and extraction
configuration YAML files.
So,
having those YAML files under version control is sufficient to enable you to reproduce the
extracted netCDF files.
Grab a copy of the model profile YAML file that Doug created: https://github.com/SalishSeaCast/analysis-doug/blob/main/notebooks/wastewater/model_profiles/SalishSeaCast-202111-wastewater-salish.yaml Store your copy of that file in your analysis repository and commit it.
Grab a copy of the sample extraction configuration YAML file that Doug created: https://github.com/SalishSeaCast/analysis-doug/blob/main/notebooks/wastewater/extract_biology.yaml Store your copy of that file in your analysis repository. Edit 2 lines of that file
line 5 that starts with
model profile:
to set the absolute path to your copy of the model profile YAML fileline 33 that starts with
dest dir:
to set the absolute path to your directory where you will store the results of your extractions
Commit your modified file.
In a terminal session on salish
,
activate your reshapr
conda environment,
and do a test extraction.
For Doug,
that looks like:
cd /ocean/dlatorne/MEOPAR/analysis-doug/
analysis-doug$ conda activate reshapr
(/home/dlatorne/conda_envs/reshapr) analysis-doug$ reshapr extract notebooks/wastewater/extract_biology.yaml
2023-10-19 12:13:43 [info ] loaded config config_file=notebooks/wastewater/extract_biology.yaml
2023-10-19 12:13:43 [info ] loaded model profile model_profile_yaml=/ocean/dlatorne/MEOPAR/analysis-doug/notebooks/wastewater/model_profiles/SalishSeaCast-202111-wastewater-salish.yaml
2023-10-19 12:13:48 [info ] dask cluster dashboard dashboard_link=http://127.0.0.1:8787/status dask_config_yaml=/ocean/dlatorne/MOAD/Reshapr-10jul23/cluster_configs/salish_cluster.yaml
2023-10-19 12:13:49 [info ] extracting variables
2023-10-19 12:13:49,882 - distributed.nanny - WARNING - Restarting worker
2023-10-19 12:13:50 [info ] wrote netCDF4 file nc_path=/ocean/dlatorne/MOAD/extractions/SalishSeaCast_wastewater_day_avg_biology_20180101_20180102.nc
2023-10-19 12:13:50 [info ] total time t_total=7.281958341598511
Be sure to use the path (relative or absolute) to your extraction YAML file in the reshapr extract command.
Changing the Extraction Parameters
Here is the contents of the example extract_biology.yaml
file:
1 # Reshapr configuration to extract day-averages of interesting biology variables
2 # near Iona Island wastewater outfall
3
4 dataset:
5 model profile: /ocean/dlatorne/MEOPAR/analysis-doug/notebooks/wastewater/model_profiles/SalishSeaCast-202111-wastewater-salish.yaml
6 time base: day
7 variables group: biology
8
9 dask cluster: salish_cluster.yaml
10
11 start date: 2018-01-01
12 end date: 2018-01-02
13 extract variables:
14 - ammonium
15 - nitrate
16 - diatoms
17
18 selection:
19 depth:
20 # NOTE: use depth level numbers, not depths in meters
21 depth max: 30
22 grid y:
23 y min: 430
24 y max: 471
25 grid x:
26 x min: 280
27 x max: 321
28
29 extracted dataset:
30 name: SalishSeaCast_wastewater_day_avg_biology
31 description: Day-averaged ammonium, nitrate & diatoms extracted from SalishSeaCast v202111
32 NEMO model with wastewater outfalls
33 dest dir: /ocean/dlatorne/MOAD/extractions/
Version Control Your Extraction YAML Files
As you build your collection of extraction YAML files remember to give them descriptive names and to commit them with messages that explain what they are for. That ensures that your analysis progress will be well documented and reproducible.
Start and/or End Dates
You can change the start and/or end dates for the extraction by editing the start date:
and/or end date:
lines in the YAML file.
Alternatively,
you can use the --start-date
and/or --end-date
command-line options in the
reshapr extract command to override the start and/or end dates in the YAML file.
Use reshapr extract --help to see the details of how to do that.
Variables
You can change the variables that you extract by changing the variable group:
name in line 5,
and the list of variables names in the lines following the extract variables:
key at line 13.
To learn the names of the available variable groups and the variables in them,
use the reshapr info command with the path and file name of your model profile.
For example:
reshapr info /ocean/dlatorne/MEOPAR/analysis-doug/notebooks/wastewater/model_profiles/SalishSeaCast-202111-wastewater-salish.yaml
/ocean/dlatorne/MEOPAR/analysis-doug/notebooks/wastewater/model_profiles/SalishSeaCast-202111-wastewater-salish.yaml:
SalishSeaCast version 202111 NEMO with wastewater outfalls results
on storage accessible from salish.
variable groups from time intervals in this model:
day
biology
chemistry
biology growth rates
grazing
light
mortality
physics tracers
vvl grid
hour
biology
chemistry
light
physics tracers
turbulence
u velocity
v velocity
vvl grid
w velocity
Please use reshapr info model-profile time-interval variable-group
(e.g. reshapr info SalishSeaCast-201905 hour biology)
to get the list of variables in a variable group.
Please use reshapr info --help to learn how to get other information,
or reshapr --help to learn about other sub-commands.
shows the lists of variable groups, divided into day-averaged and hour-averaged collections. From that we can see the list of variables in the day-averaged physics tracers variable group with:
reshapr info /ocean/dlatorne/MEOPAR/analysis-doug/notebooks/wastewater/model_profiles/SalishSeaCast-202111-wastewater-salish.yaml day physics tracers
/ocean/dlatorne/MEOPAR/analysis-doug/notebooks/wastewater/model_profiles/SalishSeaCast-202111-wastewater-salish.yaml:
SalishSeaCast version 202111 NEMO with wastewater outfalls results
on storage accessible from salish.
day-averaged variables in physics tracers group:
- sossheig : Sea Surface Height [m]
- votemper : Conservative Temperature [degree_C]
- vosaline : Reference Salinity [g kg-1]
- sigma_theta : Potential Density (sigma_theta) [kg m-3]
- e3t : T-cell Thickness [m]
Please use reshapr info --help to learn how to get other information,
or reshapr --help to learn about other sub-commands.
Depth-y-x Slab Selection
You can change the depth,
y direction,
and x direction limits of your extraction by editing the selection:
section that starts on
line 18.
Remember that Python uses 0-based indexing and that Python intervals are open on the right.
So,
to get the the y grid point from 430 to 470 you need to use:
selection:
grid y:
y min: 430
y max: 471
Extraction File Name and Path
You can change the beginning of the file name that your extracted netCDF dataset file will be
written to and the description in its metadata by editing the name:
and description:
values
in lines 30 and 31.
With SalishSeaCast_wastewater_day_avg_biology
as the value of name:
,
and extraction for 2018-01-01 to 2018-01-31 will produce a netCDF file called
SalishSeaCast_wastewater_day_avg_biology_20180101_20180131.nc
.
You can change the directory where your extracted netCDF dataset files will be written to
by editing the dest dir:
value in line 33.
As noted in File Organization and Executing Extractions,
do not store extracted netCDF dataset files in a Git repository or try to commit and push them
to GitHub - they are too large.
Iona Wastewater Model Profile
Here is the contents of the SalishSeaCast-202111-wastewater-salish.yaml
file:
1 description: SalishSeaCast version 202111 NEMO with wastewater outfalls results
2 on storage accessible from salish.
3
4 time coord:
5 name: time_counter
6 y coord:
7 name: y
8 x coord:
9 name: x
10
11 # Chunking scheme used for the netCDF4 files
12 # Note that coordinate names (keys) are conceptual here.
13 # They are replaced with actual coordinate names in files in the code;
14 # e.g. time is replaced by time_counter for dataset loading
15 chunk size:
16 time: 24
17 depth: 40
18 y: 898
19 x: 398
20
21 geo ref dataset:
22 path: https://salishsea.eos.ubc.ca/erddap/griddap/ubcSSnBathymetryV21-08
23 y coord: gridY
24 x coord: gridX
25
26 extraction time origin: 2007-01-01
27
28 results archive:
29 path: /data/sallen/results/MEOPAR/wastewater/long_run/
30 datasets:
31 day:
32 biology:
33 file pattern: "{ddmmmyy}/SalishSea_1d_{yyyymmdd}_{yyyymmdd}_biol_T.nc"
34 depth coord: deptht
35 chemistry:
36 file pattern: "{ddmmmyy}/SalishSea_1d_{yyyymmdd}_{yyyymmdd}_chem_T.nc"
37 depth coord: deptht
38 biology growth rates:
39 file pattern: "{ddmmmyy}/SalishSea_1d_{yyyymmdd}_{yyyymmdd}_prod_T.nc"
40 depth coord: deptht
41 grazing:
42 file pattern: "{ddmmmyy}/SalishSea_1d_{yyyymmdd}_{yyyymmdd}_graz_T.nc"
43 depth coord: deptht
44 light:
45 file pattern: "{ddmmmyy}/SalishSea_1d_{yyyymmdd}_{yyyymmdd}_chem_T.nc"
46 depth coord: deptht
47 mortality:
48 file pattern: "{ddmmmyy}/SalishSea_1d_{yyyymmdd}_{yyyymmdd}_graz_T.nc"
49 depth coord: deptht
50 physics tracers:
51 file pattern: "{ddmmmyy}/SalishSea_1d_{yyyymmdd}_{yyyymmdd}_grid_T.nc"
52 depth coord: deptht
53 vvl grid:
54 file pattern: "{ddmmmyy}/SalishSea_1d_{yyyymmdd}_{yyyymmdd}_grid_T.nc"
55 depth coord: deptht
56 hour:
57 biology:
58 file pattern: "{ddmmmyy}/SalishSea_1h_{yyyymmdd}_{yyyymmdd}_biol_T.nc"
59 depth coord: deptht
60 chemistry:
61 file pattern: "{ddmmmyy}/SalishSea_1h_{yyyymmdd}_{yyyymmdd}_chem_T.nc"
62 depth coord: deptht
63 light:
64 file pattern: "{ddmmmyy}/SalishSea_1h_{yyyymmdd}_{yyyymmdd}_chem_T.nc"
65 depth coord: deptht
66 physics tracers:
67 file pattern: "{ddmmmyy}/SalishSea_1h_{yyyymmdd}_{yyyymmdd}_grid_T.nc"
68 depth coord: deptht
69 turbulence:
70 file pattern: "{ddmmmyy}/SalishSea_1h_{yyyymmdd}_{yyyymmdd}_grid_W.nc"
71 depth coord: depthw
72 u velocity:
73 file pattern: "{ddmmmyy}/SalishSea_1h_{yyyymmdd}_{yyyymmdd}_grid_U.nc"
74 depth coord: depthu
75 v velocity:
76 file pattern: "{ddmmmyy}/SalishSea_1h_{yyyymmdd}_{yyyymmdd}_grid_V.nc"
77 depth coord: depthv
78 vvl grid:
79 file pattern: "{ddmmmyy}/SalishSea_1h_{yyyymmdd}_{yyyymmdd}_grid_T.nc"
80 depth coord: deptht
81 w velocity:
82 file pattern: "{ddmmmyy}/SalishSea_1h_{yyyymmdd}_{yyyymmdd}_grid_W.nc"
83 depth coord: depthw
Version Control Your Model Profile Files
When you create new model profile YAML files remember to give them descriptive names and to commit them with messages that explain what they are for. That ensures that your analysis progress will be well documented and reproducible.
Change the Model Results Path
To work with model results in a different directory tree,
change the value of path:
in the results archive:
section on line 31.
For example,
if Susan does model runs with alkalinity added to the Iona wastewater discharge,
she might store the run results in
/data/sallen/results/MEOPAR/wastewater/alkalinity_added/
.
If you are changing the model results path in a model profile,
you should seriously consider storing the profile in a new file with a different name,
updating the description:
at the top of the file,
and committing it to version control.