.. _Spine Toolbox on HPC: ======================================= Running Spine Toolbox projects on HPC's ======================================= .. contents:: :depth: 1 :local: ************ Introduction ************ This tutorial demonstrates how to run Spine Toolbox project involving `GAMS `_ workflows on a High-Performance Computing (HPC) system using the Slurm scheduler. The guide assumes you have access to a Linux-based HPC cluster with a shared filesystem and Slurm installed. You also need basic familiarity with the Linux command line and a valid Gams license for real use cases. However, this tutorial can be completed with a demo license for Gams. You will learn the full workflow: :: Local machine → Cluster login → Upload files → Submit job → Monitor → Retrieve results **************************************************** HPC's with container support (apptainer/singularity) **************************************************** The easiest way to run Spine Toolbox projects involving GAMS is to use *apptainer* containers. Log in to the Login node of your HPC and check if *apptainer* is available in your HPC with the following command: .. code-block:: bash apptainer --version If this fails, try checking if *apptainer* is available as a module: .. code-block:: bash module avail apptainer If the response is a version number or a list of package names and version numbers, you are good to continue to the next section. If you see an error message or something like 'module not available', skip the next section and continue from (`HPC's without container support`_). .. Note:: If the previous commands failed, you can still try if `singularity` is available with `singularity --version` or `module avail singularity`. Apptainer was previously called singularity. Building the container ---------------------- Apptainer is an open source container platform designed for ease-of-use on shared systems and in high performance computing (HPC) environments. The container is a single file (.sif), which can be built by using an Apptainer image definition (.def) file. Building the container requires using Linux or Windows Subsystem for Linux (WSL) on Windows. The following instructions are for WSL (v2+) on Windows with an Ubuntu distro (tested on Ubuntu 24.04). Please make sure you have WSL version 2 or later since version 1 is being phased out as obsolete. If you don't have WSL installed, please contact your organizations IT department for help. Building Apptainer containers is done using *.def* files. You can `download and save hpc_container.def on your own system here <../_static/hpc_container.def>`_. Save `hpc_container.def` file into a mounted drive (for example, `/mnt/c/users//hpc/hpc_container.def`) for easier access. The file installs the following apps into the container: - Ubuntu 26.04 - Python 3.13 - Spine Toolbox (latest release) - Gams 53.5 .. note:: Julia and SpineOpt are not included in this container. If you need them, they can be easily added to the .def file if you want to run Spine Toolbox projects with SpineOpt tools on an HPC as well. To start building the container, open command prompt or powershell on Windows and type .. code-block:: bash wsl Cd to `/mnt/c/users//hpc/` or where ever you saved the **hpc_container.def** file. Install `Go` .. code-block:: bash sudo apt install -y golang Install `apptainer` by cloning the repo and building from sources .. code-block:: bash git clone https://github.com/apptainer/apptainer.git cd apptainer ./mconfig make -C builddir sudo make -C builddir install Ensure `fakeroot` is configured .. code-block:: bash sudo apt install fakeroot uidmap Build the container by running .. code-block:: bash apptainer build --fakeroot hpc_container.sif hpc_container.def Why --fakeroot? See https://apptainer.org/docs/user/latest/fakeroot.html#fakeroot-feature When the build process has completed, if you want to check that everything works, you can use the `shell` command to spawn a new shell within your container and interact with it as though it were a virtual machine. .. code-block:: bash apptainer shell hpc_container.sif For example, you can check the versions of Python and Gams with `python --version` and `gams ?` respectively inside the shell. Type `exit` to close the container shell, type `exit` to close wsl and then close the terminal. Running a Spine Toolbox project on an HPC ----------------------------------------- In this section, you need the following: - Spine Toolbox project with a Gams Tool (test project available in /execution_tests/gams_on_hpc_tutorial) - Container file (**hpc_container.sif**) - GAMS license file (optional for this tutorial; required for real use cases) - Slurm script .. attention:: It is recommended to run all **Tool** items in your Spine Toolbox project in *"source directory"* mode. You can verify this by opening the project in Spine Toolbox on your local machine, selecting each Tool, and checking that the **source dir** option is enabled in the Tool properties. Preparing files on the HPC ++++++++++++++++++++++++++ Upload all required files to your HPC's home directory using SCP, WinSCP or rsync. We will be using `gams_on_hpc_tutorial` project in this tutorial: 1. Upload container: ``$HOME/spinetoolbox/sifs/hpc_container.sif`` 2. Upload project: ``$HOME/spinetoolbox/projects/gams_on_hpc_tutorial`` 3. Upload GAMS license: ``$HOME/spinetoolbox/licenses/gamslic.txt`` 4. Create a Slurm script file: ``$HOME/spinetoolbox/projects/gams_on_hpc_tutorial/run_on_hpc.sh`` with the following content .. code-block:: bash #!/bin/bash #SBATCH --job-name=spinetoolbox_on_hpc #SBATCH --output=%j.out #SBATCH --error=%j.err #SBATCH --time=00:30:00 #SBATCH --cpus-per-task=1 #SBATCH --mem=4G # Make folder for Slurm output logs mkdir -p logs # Load apptainer. Uncomment if apptainer is available as a module. # module load apptainer set -euxo pipefail # Exit on Error # ---------------------------- # User configuration # ---------------------------- SUBMIT_DIR="$SLURM_SUBMIT_DIR" PROJECT_NAME=gams_on_hpc_tutorial HOME_BASE="$HOME/spinetoolbox" # Choose ONE of these (uncomment the appropriate line) BASE_TMP="${SCRATCH:-}" # Recommended if available # BASE_TMP="${WORK:-}" # Alternative on some systems # BASE_TMP="${TMPDIR:-}" # Often set automatically by Slurm # Fallback if chosen variable is empty if [ -z "$BASE_TMP" ]; then echo "Warning: selected BASE_TMP is not set, falling back to \$HOME/tmp" BASE_TMP="$HOME/tmp" fi SCRATCH_BASE="$BASE_TMP/spinetoolbox_runs/$SLURM_JOB_ID" mkdir -p "$SCRATCH_BASE" echo "Using temporary directory: $SCRATCH_BASE" # ---------------------------- # Stage data # ---------------------------- echo "Copying project to scratch..." rsync -av "$HOME_BASE/$PROJECT_NAME/" "$SCRATCH_BASE/$PROJECT_NAME/" # If license is available, uncomment this # rsync -av "$HOME_BASE/licenses/gamslic.txt" "$SCRATCH_BASE/" cd $SCRATCH_BASE/$PROJECT_NAME # ---------------------------- # Run container # ---------------------------- echo "Running Spine Toolbox..." apptainer exec \ --bind $SCRATCH_BASE:$SCRATCH_BASE \ --bind $HOME_BASE:$HOME_BASE \ $HOME_BASE/sifs/hpc_container.sif \ spinetoolbox --execute-only $SCRATCH_BASE/$PROJECT_NAME/ \ > spinetoolbox.log 2>&1 # ---------------------------- # Copy results back # ---------------------------- echo "Listing results directory:" ls -R $SCRATCH_BASE/$PROJECT_NAME echo "Copying results back to home..." rsync -avh $SCRATCH_BASE/$PROJECT_NAME/ $HOME_BASE/$PROJECT_NAME/ # ----------------------------------- # Move log files to dedicated folder # ----------------------------------- LOG_DIR="$HOME_BASE/$PROJECT_NAME/logs/$SLURM_JOB_ID" mkdir -p "$LOG_DIR" mv "$SUBMIT_DIR/${SLURM_JOB_ID}.out" "$LOG_DIR/out.txt" 2>/dev/null || true mv "$SUBMIT_DIR/${SLURM_JOB_ID}.err" "$LOG_DIR/err.txt" 2>/dev/null || true mv "$HOME_BASE/$PROJECT_NAME/spinetoolbox.log" "$LOG_DIR/spinetoolbox.log" echo "Done." .. attention:: Line endings in Slurm scripts must be Unix style (LF). The ``run_on_hpc.sh`` script stages a Spine Toolbox project to a temporary working directory on the HPC system, runs it inside an Apptainer container, and then copies the results back to the original project location. This approach ensures efficient use of the HPC filesystem by performing computation on a fast scratch or temporary storage area while preserving results in the user’s home directory. The script uses ``rsync`` for data transfer, which provides a robust and reliable way to copy project files between locations while preserving file attributes and ensuring that all files, including hidden ones, are transferred correctly. During execution, all model output is written to the staged project directory in scratch space. After completion, the results are synchronized back to the original project directory in the user’s home folder. For reproducibility and debugging, the script also collects log files from each run. Standard Slurm output and error files, as well as the Spine Toolbox execution log, are organized into a dedicated directory ``logs//`` within the project. This ensures that logs from different runs are preserved and can be easily traced to a specific job. The folder structure on your HPC should look like this now: .. code-block:: text home/ └── spinetoolbox/ ├── sifs/ │ └── hpc_container.sif ├── projects/ │ └── gams_on_hpc_tutorial/ │ ├── .spinetoolbox/ │ │ ├── items/ │ │ │ └── ... │ │ ├── specifications/ │ │ │ └── ... │ │ └── project.json │ ├── run_on_hpc.sh │ ├── model.gms │ └── ... └── licenses/ └── gamslic.txt When you want to execute another Spine Toolbox project, copy the project under `/home/spinetoolbox/projects/` and add a separate `run_on_hpc.sh` Slurm script for that project. Editing the Slurm script for your HPC +++++++++++++++++++++++++++++++++++++ You may need to adjust the Slurm script (``run_on_hpc.sh``) to match your HPC environment: 1. **Apptainer module** Check whether Apptainer is available as a module on your system. If it is, uncomment the following line:: # module load apptainer 2. **Project name** Update the ``PROJECT_NAME`` variable to match your Spine Toolbox project folder name. For this tutorial, it should be:: PROJECT_NAME=gams_on_hpc_tutorial 3. **Temporary working directory** Check your HPC documentation for the recommended working or scratch filesystem. - If your system uses ``$SCRATCH``, no changes are needed. - Otherwise, update the ``BASE_TMP`` setting by commenting or uncommenting the appropriate line (e.g. ``$WORK`` or ``$TMPDIR``). - If none of these variables are available, you can define your own custom path. 4. **Slurm job parameters** Adjust the resource requests and output settings as needed: - ``--job-name``: Job name - ``--time``: Maximum runtime - ``--cpus-per-task``: Number of CPU cores - ``--mem``: Memory allocation - ``--output``: Output log file - ``--error``: Error log file Submit job to Slurm Scheduler +++++++++++++++++++++++++++++ When you are ready to execute the project, cd to home/spinetoolbox/projects/gams_on_hpc_tutorial and run .. code-block:: bash sbatch run_on_hpc.sh The response will be something like ``` Submitted batch job 1303767 ``` where 1303767 is the job id Check status of submitted job +++++++++++++++++++++++++++++ .. code-block:: bash squeue -j where ** is the id returned by the `sbatch` command. To check the status of all of your submitted tasks, run .. code-block:: bash squeue -u $USER If this command fails, replace $USER with your user name. When a job disappears from the the list returned by the `squeue` command, it is finished. Check job output files ++++++++++++++++++++++ Since `out.txt` and `err.txt` were given in the Slurm script as the values for *--output* and *--error*, you can find the stdout and stderr of your job in these files. The file `err.txt` is empty if everything is Ok. To view the files: .. code-block:: bash cat out.txt cat err.txt Final job status ++++++++++++++++ .. code-block:: bash sacct -j where ```` is the ID returned by the ``sbatch`` command. This command should return something like: .. code-block:: text JobID JobName Partition Account AllocCPUS State ExitCode ------------ ---------- ---------- ---------- ---------- ---------- -------- 1303767 spinetool+ all ba6401 1 COMPLETED 0:0 1303767.batch batch all ba6401 1 COMPLETED 0:0 1303767.extern extern all ba6401 1 COMPLETED 0:0 Live monitoring +++++++++++++++ .. code-block:: bash watch -n 2 squeue -u $USER Another option is to use `tail`: .. code-block:: bash tail -f out.txt Again, if $USER is not defined, replace it with your user name. This function tails the job progress and updates every two seconds. Checking the results ++++++++++++++++++++ The result files and output from executing the project will be inside the project item folders just like when executing the project in Spine Toolbox locally. You can check the results on the HPC, or transfer the project folder back to your local computer, start Spine Toolbox, and open the project there. ******************************* HPC's without container support ******************************* .. attention:: This section is a work in progress Verify GAMS installation: .. code-block:: bash gams ? If GAMS is installed correctly, this command prints version and usage information. Accessing GAMS on HPC --------------------- Option 1: Using a Module ++++++++++++++++++++++++ Many HPC systems provide GAMS via environment modules: .. code-block:: bash module avail gams module load gams Verify: .. code-block:: bash which gams Option 2: User Installation +++++++++++++++++++++++++++ If GAMS is not provided: 1. Download the Linux version from the GAMS website 2. Extract it in your home or project directory 3. Add it to your PATH: .. code-block:: bash export PATH=$HOME/gams:$PATH Ensure that your license file is accessible (e.g., ``gamslice.txt``). Common Issues and Troubleshooting --------------------------------- License Errors ++++++++++++++ - Ensure license file is accessible on compute nodes - Check environment variables if needed: .. code-block:: bash export GAMSLICE=/path/to/gamslice.txt File Not Found ++++++++++++++ - Verify paths are correct relative to the SLURM working directory - Use: .. code-block:: bash echo $PWD Job Stuck in Queue ++++++++++++++++++ - Cluster is full - Resource request too large Memory Errors +++++++++++++ Increase memory: .. code-block:: bash #SBATCH --mem=16G Solver Not Found ++++++++++++++++ .. code-block:: bash module load gurobi Check installation: .. code-block:: bash which gurobi_cl