AcuSolve in HPC

Unknown
edited November 2022 in Community Q&A

Hi experts,

I have been looking for some tutorials related to running acusolve in a high performance computing facility. I came to know about acuSub command in the terminal. I would be grateful if anyone shares experience on running in HPC or it would be of great help if anyone has a proper documentation or link to particular website on using acuSub command in terminal. I have tried acuSub -h but I am looking for some practical/detailed use of the command along with some examples.

It would be helpful to know about using batch script, setting parallel mode, using SLURM for submitting jobs, etc.

Thanks.

Answers

  • acupro
    acupro
    Altair Employee
    edited November 2022

    There's really no documentation (other than from the -h option) for acuSub, is it's not a primary script.  Essentially acuSub will create the submit script for whichever supported scheduler you indicate, and in some cases submit it for you.  You mentioned you use SLURM.  Let's assume your problem name is test1 (and thus your input file is test1.inp), that you have 4 compute nodes to use, each with 24 cores - so 96 cores total.  We'll assume the queue you want to use is workq.  From the job directory - where the .inp and MESH.DIR reside, issue:

    acuSub -pb test1 -np 96 -ppn 24 pq workq -sched slurm

    That would be the minimum needed to create the submit script for slurm.  The default is also to submit the job (also by default the name of the job would be test 1) based on the setting for -sub being true.

  • Unknown
    edited November 2022

    There's really no documentation (other than from the -h option) for acuSub, is it's not a primary script.  Essentially acuSub will create the submit script for whichever supported scheduler you indicate, and in some cases submit it for you.  You mentioned you use SLURM.  Let's assume your problem name is test1 (and thus your input file is test1.inp), that you have 4 compute nodes to use, each with 24 cores - so 96 cores total.  We'll assume the queue you want to use is workq.  From the job directory - where the .inp and MESH.DIR reside, issue:

    acuSub -pb test1 -np 96 -ppn 24 pq workq -sched slurm

    That would be the minimum needed to create the submit script for slurm.  The default is also to submit the job (also by default the name of the job would be test 1) based on the setting for -sub being true.

    Hi acupro,

    I am able to obtain a job submission script using acuSub. I have a hpc system with 4 nodes each of 128 cores. So, total 512 cores. I use slurm for job submission. 

    I am not able to run the problem in multiple nodes. I edited the acusim.cnf file and wrote num_processors = 512. I checked at the HPC system that displays compute nodes being used. When i check the log file, i see the following message:

    imageI think the job is just running in cn456 node. Although i edit acusim.cnf file and write 128 processors, the simulation does not run.

    I am able to run a simulation on single node. But i want a single problem to run on multiple nodes.

     

    Thanks!

  • acupro
    acupro
    Altair Employee
    edited November 2022

    Hi acupro,

    I am able to obtain a job submission script using acuSub. I have a hpc system with 4 nodes each of 128 cores. So, total 512 cores. I use slurm for job submission. 

    I am not able to run the problem in multiple nodes. I edited the acusim.cnf file and wrote num_processors = 512. I checked at the HPC system that displays compute nodes being used. When i check the log file, i see the following message:

    imageI think the job is just running in cn456 node. Although i edit acusim.cnf file and write 128 processors, the simulation does not run.

    I am able to run a simulation on single node. But i want a single problem to run on multiple nodes.

     

    Thanks!

    You'll likely need to work with you IT support to develop the proper Slurm script