Radioss jobs not being run in parallel correctly during multi-execution

Ingeniorator
Ingeniorator New Altair Community Member
edited June 2022 in Community Q&A

Hi everyone,

 

I'm running a very basic Radioss crash model in conjunction with morphing in Hyperstudy to find the optimal shape of a lattice. The model results are as expected, however when trying to run several of them at the same time with the solver arguments ${file} -np 2 -nt 1, I find that only two cores out of 32 are used at 100% regardless of the specified number of parallel runs. If, however, 16 jobs are run with ${file} -np 2, then all 32 cores are working at 100% but without any noticeable speedup whatsoever compared to 32 consecutive single job runs. I don't suspect the storage write speed to be an issue as it sits between 0-2% load, and there's plenty of RAM left.

 

The use of the solver arguments -np x -nt 1 turned out to be the fastest in stand-alone Radioss jobs, so it's odd that it does not work as expected with Hyperstudy. Would anyone have any insight?

Answers

  • Michael Herve_21439
    Michael Herve_21439
    Altair Employee
    edited July 2021

    Hello Ingeniorator,

     

    if you set "-np 2 -nt 1" for running RADIOSS, each RADIOSS job should run 2 process, each one using one thread.

    If you set "-np 2" only, the number of cores for each process should be one, except if you have an environment variable OMP_NUM_THREADS defined, in that case each process uses as much cores as the value in OMP_NUM_THREADS.

     

    Having said that, I'm suprised only 2 of the cores are used if you set "-np 2 -nt 1" and if you run several jobs in parallel. For such small jos, you may try to use "-nt 2" instead, and see if you have a better balance of RADIOSS jobs over your cores.

    When using "-nt 2" only, RADIOSS launches only one process, which uses 2 threads.

     

    Also, please note that the number of multi-execution jobs is dependant of the approach/algortihm you're using.

    For instance, a GRSM otpimization takes generaly 5 to 7 runs for the first iteration, then 2 runs for the other iterations.

    So you can set any value for multi-execution, when GRSM starts the 2nd iteration, it won't run more than 2 jobs at the same time.

     

    Best Regards,

    Michael

  • Ingeniorator
    Ingeniorator New Altair Community Member
    edited March 2022

    Hello Ingeniorator,

     

    if you set "-np 2 -nt 1" for running RADIOSS, each RADIOSS job should run 2 process, each one using one thread.

    If you set "-np 2" only, the number of cores for each process should be one, except if you have an environment variable OMP_NUM_THREADS defined, in that case each process uses as much cores as the value in OMP_NUM_THREADS.

     

    Having said that, I'm suprised only 2 of the cores are used if you set "-np 2 -nt 1" and if you run several jobs in parallel. For such small jos, you may try to use "-nt 2" instead, and see if you have a better balance of RADIOSS jobs over your cores.

    When using "-nt 2" only, RADIOSS launches only one process, which uses 2 threads.

     

    Also, please note that the number of multi-execution jobs is dependant of the approach/algortihm you're using.

    For instance, a GRSM otpimization takes generaly 5 to 7 runs for the first iteration, then 2 runs for the other iterations.

    So you can set any value for multi-execution, when GRSM starts the 2nd iteration, it won't run more than 2 jobs at the same time.

     

    Best Regards,

    Michael

    Hello Michael,

    sorry for getting back to you so late. The issue persists when running RADIOSS jobs via the compute console as well. For example, starting two jobs with the same settings being -np 4 -nt 1, both jobs get assigned to the same four cores, instead of eight cores total. Starting one job with -np 4 (no -nt setting), all 32 cores are loaded 100%. Using -np 1 -nt 4 with two jobs again loads only 4 cores total, not 8 as intended. Same goes for only using -nt 4. Is this due to some environment variable or MPI setting? Which method must be used such that several jobs can be run in parallel? In addition, the load gets spread across both CPU sockets introducing significant communication delays, but that's likely due to Windows not being able to handle multi-socket systems efficiently.

     

    image

  • Ingeniorator
    Ingeniorator New Altair Community Member
    edited June 2022

    Hello Michael,

    sorry for getting back to you so late. The issue persists when running RADIOSS jobs via the compute console as well. For example, starting two jobs with the same settings being -np 4 -nt 1, both jobs get assigned to the same four cores, instead of eight cores total. Starting one job with -np 4 (no -nt setting), all 32 cores are loaded 100%. Using -np 1 -nt 4 with two jobs again loads only 4 cores total, not 8 as intended. Same goes for only using -nt 4. Is this due to some environment variable or MPI setting? Which method must be used such that several jobs can be run in parallel? In addition, the load gets spread across both CPU sockets introducing significant communication delays, but that's likely due to Windows not being able to handle multi-socket systems efficiently.

     

    image

    For future reference, if the environment variable KMP_AFFINITY is set to "disabled", the jobs are distributed correctly.