dask Clusters

Intro - coming soon

On-Demand Cluster - The Easy Way

Coming soon…

Persistent Cluster

For applications where multiple Reshapr processes need to be run, setting up and managing a persistent dask cluster that your processes connect to and use avoids the overhead of starting up a cluster for each Reshapr process.

An example of the kind of processing where this approach is used is running reshapr extract in a bash loop to resample day-averaged datasets to get month-averaged datasets. That is a post-processing step that is done as part of running a SalishSeaCast hindcast. The cluster scheduler, workers, and bash loop are all run in separate terminals in a tmux session on salish. An ssh tunnel can be set up to connect a browser session to the cluster dashboard for monitoring and analysis of the processing.

Note

In contrast to the on-demand clusters that are created when you just run a reshapr extract command, persistent clusters must be managed. If you create a persistent cluster, it is your responsibility to shut it down when you are finished with it.

If you know that another group member is also using a persistent cluster, consider coordinating with them to use the same cluster instead of spinning up a new cluster.

Here is a step-by-step example of using a persistent cluster to run reshapr extract in a bash loop to resample day-averaged datasets to get month-averaged datasets:

  1. Create a new tmux session on salish:

    $ tmux new -s month-avg-201905
    
  2. In the first tmux terminal, activate your reshapr conda environment and launch the dask-scheduler:

    $ conda activate reshapr
    (reshapr)$ dask-scheduler
    

    Use Control-b , to rename the tmux terminal to dask-scheduler.

    Make the note of the IP address and port numbers for the scheduler and dashboard in the log output; e.g.

    2022-06-16 12:15:58 - distributed.scheduler - INFO - Scheduler at: tcp://142.103.36.12:8786
    2022-06-16 12:15:58 - distributed.scheduler - INFO - dashboard at:                    :8787
    

    8786 and 8787 are the default scheduler and dashboard port number, respectively, but you may see different port numbers if there are other clusters already running.

  3. Start a second tmux terminal with Control-b c, activate your reshapr conda environment and launch the first dask-worker as a background process using the scheduler IP address and port number noted above:

    $ conda activate reshapr
    (reshapr)$ dask-worker --nworkers=1 --nthreads=4 142.103.36.12:8786 &
    

    Use Control-b , to rename the tmux terminal to dask-workers.

    Additional workers can be added to the cluster by repeating the same dask-worker command. The log output in the dask-scheduler terminal (Control-b 0) will show the workers joining the cluster.

  4. Start a third tmux terminal with Control-b c and activate your reshapr conda environment there too. This is the terminal in which you will run reshapr extract commands.

    To run those commands on the persistent cluster, set the value of the dask cluster item in your extract Process Configuration File to the scheduler IP address and port number noted above; e.g.

    dask cluster: 142.103.36.12:8786
    
  5. Optional: To monitor the cluster in your browser on your laptop or workstation, start a terminal session there and set up an ssh tunnel to the scheduler’s dashboard port:

    $ ssh -L -N 8787:salish:8787 salish
    

    That command creates an ssh tunnel between port 8787 on your laptop/workstation and port 8787 on salish. You can use any number ≥1024 you want instead of 8787 as the local port number on your laptop/workstation. The number after :salish: has to be the scheduler’s dashboard port number noted above. The command also assumes that you have an entry for salish in your ~/.ssh/config file.

    Open a new tab in the browser on your laptop/workstation and go to http://localhost:8787/ to see the cluster dashboard.