Automatically allocating nodes by specifying the number of GPUs

User: "Rigoberto_20495"
Altair Community Member
Updated by Rigoberto_20495

Hello,

With Slurm, you can use "salloc --gpus=12" to allocate 12 GPUs, and Slurm will automatically allocate the number of nodes needed to satisfy the 12 GPU requirement.  For example, if each GPU node has 4 GPUS, then "salloc --gpus=12" will automatically allocate 3 nodes, since (3 nodes x 4 GPUs each) = 12 GPUs total.

It this possible with PBS?   I tried "qsub -l ngpus=12 ...", but "qstat" reports:

Can Never Run: Insufficient amount of resource: sales_op (none != )

I don't want to have to specify "-l nodes=" or "-l select=", since I would solely like the number of nodes that PBS allocates to be based on the number of GPUs requested.

Any suggestions on how to do this with PBS would be greatly appreciated.

Thank you.

Find more posts tagged with

Sort by:
1 - 1 of 11
    User: "Joshua Newman (Altair)"
    Altair Employee
    Accepted Answer
    Updated by Joshua Newman (Altair)

    Thank you, Josha. I'm not sure I quite understand what you've suggested, though.  The man page for "pbs_tmrsh" says:

    The program is intended to be used during MPI integration activities, and not by end-users.

     

    We don't want to use MPI to launch the job, because we don't know if the customer will have MPI installed or, if they do have it installed, where it is installed or what flavor of MPI they have.  We would rather stick with "pbsdsh", if possible, which ships with PBS Pro.

    Is there a way that I can continue to use "pbsdsh", but modify the $PBS_NODEFILE prior to its execution, such that duplicate nodes are removed and it only has one node per task?

    Currently, when I run:

    qsub -q rig_test_gpu -l "select=8:ngpus=1" -- /opt/pbs/bin/pbsdsh -- bash -c 'echo "$(hostname);CUDA_VISIBLE_DEVICES=${CUDA_VISIBLE_DEVICES}"'

    The $PBS_NODEFILE has this content:

    node010.head.cm.us.cray.com
    node010.head.cm.us.cray.com
    node010.head.cm.us.cray.com
    node010.head.cm.us.cray.com
    node002.head.cm.us.cray.com
    node002.head.cm.us.cray.com
    node002.head.cm.us.cray.com
    node002.head.cm.us.cray.com

     

    If I can "sort $PBS_NODEFILE | uniq" prior to executing "pbdsh", then it would contain:

    node010.head.cm.us.cray.com
    node002.head.cm.us.cray.com

    and only two tasks, one on each node, would hopefully get executed.

    Is there any way to add some kind of pre-execution script that qsub can call to modify the $PBS_NODEFILE prior to it calling pbsdsh?

     

    > The program is intended to be used during MPI integration activities, and not by end-users

    This is the primary use case, but it can indeed be called directly and function properly. There is an example in the PBS Pro Admin Guide 2022.2 section 8.5.8.3 "Example Job" where pbs_tmrsh is directly called within the job script.

    > Is there any way to add some kind of pre-execution script that qsub can call to modify the $PBS_NODEFILE prior to it calling pbsdsh?

    Pre-execution scripts can be created using hooks such as execjob_begin or execjob_launch, though from my tests, modifying $PBS_NODEFILE has no effect on pbsdsh. It appears the documentation should be clarified about the $PBS_NODEFILE, as it is apparently not used as the source node list for pbsdsh. I will create a ticket for that.

    My recommendation would be to use pbs_tmrsh, ssh or pdsh.

    Thanks!

    Joshua