Analysis of NEMO Runs with Iona Wastewater Discharge

Susan is running various configurations of version 202111 that include a simulation of the Iona Island Wastewater Treatment Plant Deep Sea Outfall. Since those are “research run results” in contrast to collections of daily results files from long-running hindcasts the handling of the results files and the Reshapr model profile(s) is a little different.

Note

This section serves as a guide for use of Reshapr for other “research run” applications.

Notable differences include:

The research runs are executed on an HPC cluster in multi-day segments. For the Iona wastewater case the runs were done on graham. Initial runs were 5 days long for debugging, tunning, and initial analysis development by Jake. Subsequent runs were 1 month long because that fits well in the 12-hour walltime scheduler partition on graham.
The run results are downloaded from the HPC cluster to research storage on /ocean/$USER/ or /data/$USER/. For the Iona wastewater case the results were downloaded to directory trees in /data/sallen/results/MEOPAR/wastewater/ such as /data/sallen/results/MEOPAR/wastewater/long_run/.
The multi-day run results files like /data/sallen/results/MEOPAR/wastewater/long_run/SalishSea_1h_20180101_20180131_grid_T.nc must be split into 1-day files stored in date-named subdirectories like /data/sallen/results/MEOPAR/wastewater/long_run/01jan18/SalishSea_1h_20180101_20180101_grid_T.nc. At the moment, the beast way to do that is via the SalishSeaCast automation nowcast.workers.split_results worker. Only Doug and Susan have the necessary permissions to run that worker. Please ask them for help if you need to split results from another research run.
The Reshapr model profile is maintained by the user doing the analysis rather than it being included in the Reshapr code repository. Please see the Iona Wastewater Model Profile section below for details.

File Organization and Executing Extractions

Store your model profile and extraction configuration YAML files in a Git repository such as your analysis repository so that you can commit your changes to them and push them to GitHub to document your analysis history and make it reproducible. Here is an example from analysis-doug:

analysis-doug/
├── ...
├── notebooks
│   ├── ...
│   └── wastewater
│       ├── extract_biology.yaml
│       └── model_profiles
│           └── SalishSeaCast-202111-wastewater-salish.yaml

Store the results of your extractions outside of a Git repository, for example, /ocean/dlatorne/MOAD/extractions/. Extracted netCDF files are large binary files. Do not try to push them to GitHub. If you commit them and push them to GitHub you will quickly exceed file and repository size limits. They are products of the extraction process described by your model profile and extraction configuration YAML files. So, having those YAML files under version control is sufficient to enable you to reproduce the extracted netCDF files.

Grab a copy of the model profile YAML file that Doug created: https://github.com/SalishSeaCast/analysis-doug/blob/main/notebooks/wastewater/model_profiles/SalishSeaCast-202111-wastewater-salish.yaml Store your copy of that file in your analysis repository and commit it.

Grab a copy of the sample extraction configuration YAML file that Doug created: https://github.com/SalishSeaCast/analysis-doug/blob/main/notebooks/wastewater/extract_biology.yaml Store your copy of that file in your analysis repository. Edit 2 lines of that file

line 5 that starts with model profile: to set the absolute path to your copy of the model profile YAML file
line 33 that starts with dest dir: to set the absolute path to your directory where you will store the results of your extractions

Commit your modified file.

In a terminal session on salish, activate your reshapr conda environment, and do a test extraction. For Doug, that looks like:

cd /ocean/dlatorne/MEOPAR/analysis-doug/
analysis-doug$ conda activate reshapr
(/home/dlatorne/conda_envs/reshapr) analysis-doug$ reshapr extract notebooks/wastewater/extract_biology.yaml
2023-10-19 12:13:43 [info     ] loaded config                  config_file=notebooks/wastewater/extract_biology.yaml
2023-10-19 12:13:43 [info     ] loaded model profile           model_profile_yaml=/ocean/dlatorne/MEOPAR/analysis-doug/notebooks/wastewater/model_profiles/SalishSeaCast-202111-wastewater-salish.yaml
2023-10-19 12:13:48 [info     ] dask cluster dashboard         dashboard_link=http://127.0.0.1:8787/status dask_config_yaml=/ocean/dlatorne/MOAD/Reshapr-10jul23/cluster_configs/salish_cluster.yaml
2023-10-19 12:13:49 [info     ] extracting variables
2023-10-19 12:13:49,882 - distributed.nanny - WARNING - Restarting worker
2023-10-19 12:13:50 [info     ] wrote netCDF4 file             nc_path=/ocean/dlatorne/MOAD/extractions/SalishSeaCast_wastewater_day_avg_biology_20180101_20180102.nc
2023-10-19 12:13:50 [info     ] total time                     t_total=7.281958341598511

Be sure to use the path (relative or absolute) to your extraction YAML file in the reshapr extract command.

Changing the Extraction Parameters

Here is the contents of the example extract_biology.yaml file:

 # Reshapr configuration to extract day-averages of interesting biology variables
 # near Iona Island wastewater outfall

 dataset:
   model profile: /ocean/dlatorne/MEOPAR/analysis-doug/notebooks/wastewater/model_profiles/SalishSeaCast-202111-wastewater-salish.yaml
   time base: day
   variables group: biology

 dask cluster: salish_cluster.yaml

 start date: 2018-01-01
 end date: 2018-01-02
 extract variables:
   - ammonium
   - nitrate
   - diatoms

 selection:
   depth:
     # NOTE: use depth level numbers, not depths in meters
     depth max: 30
   grid y:
     y min: 430
     y max: 471
   grid x:
     x min: 280
     x max: 321

 extracted dataset:
   name: SalishSeaCast_wastewater_day_avg_biology
   description: Day-averaged ammonium, nitrate & diatoms extracted from SalishSeaCast v202111
                NEMO model with wastewater outfalls
   dest dir: /ocean/dlatorne/MOAD/extractions/

Version Control Your Extraction YAML Files

As you build your collection of extraction YAML files remember to give them descriptive names and to commit them with messages that explain what they are for. That ensures that your analysis progress will be well documented and reproducible.

Start and/or End Dates

You can change the start and/or end dates for the extraction by editing the start date: and/or end date: lines in the YAML file. Alternatively, you can use the --start-date and/or --end-date command-line options in the reshapr extract command to override the start and/or end dates in the YAML file. Use reshapr extract --help to see the details of how to do that.

Variables

You can change the variables that you extract by changing the variable group: name in line 5, and the list of variables names in the lines following the extract variables: key at line 13. To learn the names of the available variable groups and the variables in them, use the reshapr info command with the path and file name of your model profile. For example:

reshapr info /ocean/dlatorne/MEOPAR/analysis-doug/notebooks/wastewater/model_profiles/SalishSeaCast-202111-wastewater-salish.yaml
/ocean/dlatorne/MEOPAR/analysis-doug/notebooks/wastewater/model_profiles/SalishSeaCast-202111-wastewater-salish.yaml:
  SalishSeaCast version 202111 NEMO with wastewater outfalls results
  on storage accessible from salish.

variable groups from time intervals in this model:
  day
    biology
    chemistry
    biology growth rates
    grazing
    light
    mortality
    physics tracers
    vvl grid
  hour
    biology
    chemistry
    light
    physics tracers
    turbulence
    u velocity
    v velocity
    vvl grid
    w velocity

Please use reshapr info model-profile time-interval variable-group
(e.g. reshapr info SalishSeaCast-201905 hour biology)
to get the list of variables in a variable group.

Please use reshapr info --help to learn how to get other information,
or reshapr --help to learn about other sub-commands.

shows the lists of variable groups, divided into day-averaged and hour-averaged collections. From that we can see the list of variables in the day-averaged physics tracers variable group with:

reshapr info /ocean/dlatorne/MEOPAR/analysis-doug/notebooks/wastewater/model_profiles/SalishSeaCast-202111-wastewater-salish.yaml day physics tracers
/ocean/dlatorne/MEOPAR/analysis-doug/notebooks/wastewater/model_profiles/SalishSeaCast-202111-wastewater-salish.yaml:
  SalishSeaCast version 202111 NEMO with wastewater outfalls results
  on storage accessible from salish.
day-averaged variables in physics tracers group:
  - sossheig : Sea Surface Height [m]
  - votemper : Conservative Temperature [degree_C]
  - vosaline : Reference Salinity [g kg-1]
  - sigma_theta : Potential Density (sigma_theta) [kg m-3]
  - e3t : T-cell Thickness [m]

Please use reshapr info --help to learn how to get other information,
or reshapr --help to learn about other sub-commands.

Depth-y-x Slab Selection

You can change the depth, y direction, and x direction limits of your extraction by editing the selection: section that starts on line 18. Remember that Python uses 0-based indexing and that Python intervals are open on the right. So, to get the the y grid point from 430 to 470 you need to use:

selection:
  grid y:
    y min: 430
    y max: 471

Extraction File Name and Path

You can change the beginning of the file name that your extracted netCDF dataset file will be written to and the description in its metadata by editing the name: and description: values in lines 30 and 31. With SalishSeaCast_wastewater_day_avg_biology as the value of name:, and extraction for 2018-01-01 to 2018-01-31 will produce a netCDF file called SalishSeaCast_wastewater_day_avg_biology_20180101_20180131.nc.

You can change the directory where your extracted netCDF dataset files will be written to by editing the dest dir: value in line 33. As noted in File Organization and Executing Extractions, do not store extracted netCDF dataset files in a Git repository or try to commit and push them to GitHub - they are too large.

Iona Wastewater Model Profile

Here is the contents of the SalishSeaCast-202111-wastewater-salish.yaml file:

 description: SalishSeaCast version 202111 NEMO with wastewater outfalls results
              on storage accessible from salish.

 time coord:
   name: time_counter
 y coord:
   name: y
 x coord:
   name: x

 # Chunking scheme used for the netCDF4 files
 # Note that coordinate names (keys) are conceptual here.
 # They are replaced with actual coordinate names in files in the code;
 # e.g. time is replaced by time_counter for dataset loading
 chunk size:
   time: 24
   depth: 40
   y: 898
   x: 398

 geo ref dataset:
   path: https://salishsea.eos.ubc.ca/erddap/griddap/ubcSSnBathymetryV21-08
   y coord: gridY
   x coord: gridX

 extraction time origin: 2007-01-01

 results archive:
   path: /data/sallen/results/MEOPAR/wastewater/long_run/
   datasets:
     day:
       biology:
         file pattern: "{ddmmmyy}/SalishSea_1d_{yyyymmdd}_{yyyymmdd}_biol_T.nc"
         depth coord: deptht
       chemistry:
         file pattern: "{ddmmmyy}/SalishSea_1d_{yyyymmdd}_{yyyymmdd}_chem_T.nc"
         depth coord: deptht
       biology growth rates:
         file pattern: "{ddmmmyy}/SalishSea_1d_{yyyymmdd}_{yyyymmdd}_prod_T.nc"
         depth coord: deptht
       grazing:
         file pattern: "{ddmmmyy}/SalishSea_1d_{yyyymmdd}_{yyyymmdd}_graz_T.nc"
         depth coord: deptht
       light:
         file pattern: "{ddmmmyy}/SalishSea_1d_{yyyymmdd}_{yyyymmdd}_chem_T.nc"
         depth coord: deptht
       mortality:
         file pattern: "{ddmmmyy}/SalishSea_1d_{yyyymmdd}_{yyyymmdd}_graz_T.nc"
         depth coord: deptht
       physics tracers:
         file pattern: "{ddmmmyy}/SalishSea_1d_{yyyymmdd}_{yyyymmdd}_grid_T.nc"
         depth coord: deptht
       vvl grid:
         file pattern: "{ddmmmyy}/SalishSea_1d_{yyyymmdd}_{yyyymmdd}_grid_T.nc"
         depth coord: deptht
     hour:
       biology:
         file pattern: "{ddmmmyy}/SalishSea_1h_{yyyymmdd}_{yyyymmdd}_biol_T.nc"
         depth coord: deptht
       chemistry:
         file pattern: "{ddmmmyy}/SalishSea_1h_{yyyymmdd}_{yyyymmdd}_chem_T.nc"
         depth coord: deptht
       light:
         file pattern: "{ddmmmyy}/SalishSea_1h_{yyyymmdd}_{yyyymmdd}_chem_T.nc"
         depth coord: deptht
       physics tracers:
         file pattern: "{ddmmmyy}/SalishSea_1h_{yyyymmdd}_{yyyymmdd}_grid_T.nc"
         depth coord: deptht
       turbulence:
         file pattern: "{ddmmmyy}/SalishSea_1h_{yyyymmdd}_{yyyymmdd}_grid_W.nc"
         depth coord: depthw
       u velocity:
         file pattern: "{ddmmmyy}/SalishSea_1h_{yyyymmdd}_{yyyymmdd}_grid_U.nc"
         depth coord: depthu
       v velocity:
         file pattern: "{ddmmmyy}/SalishSea_1h_{yyyymmdd}_{yyyymmdd}_grid_V.nc"
         depth coord: depthv
       vvl grid:
         file pattern: "{ddmmmyy}/SalishSea_1h_{yyyymmdd}_{yyyymmdd}_grid_T.nc"
         depth coord: deptht
       w velocity:
         file pattern: "{ddmmmyy}/SalishSea_1h_{yyyymmdd}_{yyyymmdd}_grid_W.nc"
         depth coord: depthw

Version Control Your Model Profile Files

When you create new model profile YAML files remember to give them descriptive names and to commit them with messages that explain what they are for. That ensures that your analysis progress will be well documented and reproducible.

Change the Model Results Path

To work with model results in a different directory tree, change the value of path: in the results archive: section on line 31. For example, if Susan does model runs with alkalinity added to the Iona wastewater discharge, she might store the run results in /data/sallen/results/MEOPAR/wastewater/alkalinity_added/.

If you are changing the model results path in a model profile, you should seriously consider storing the profile in a new file with a different name, updating the description: at the top of the file, and committing it to version control.