How to speed-up simulation time in RADIOSS?

Neeraj Kumar
Neeraj Kumar Altair Community Member
edited April 2022 in Community Q&A

My workstation has 16 cores and 32 threads. What values do I set for -nt, -np, -mpi and -cores in Altair Compute Console.

Currently I am using this setting, -np 16. But many times the solver fails mid-way. I have attached the error file in the attachments. Please go through it.

Thanks.

Best Answer

  • PaulAltair
    PaulAltair
    Altair Employee
    edited April 2022 Answer ✓

    The error is not as a result of the number of processes used, it is just stating the job failed, you can check in your 0001.out file for why

    For speed up, np is number of processes, nt is threads per process, Radioss uses both, so your total core usage will be np x nt

    if using -np, a rule of thumb is that you should have minimum approx 5k elements per process to see good speedup

    So in your case, if your model is smaller than 80k elements it might be better to use a hybrid combination 

    In your case you have 16 cores (multithreading won't help you)

    So you could use -np 8 -nt 2, -np 4 -nt 4, -np 2 -nt 8, or just -nt 16

    And see which works best for you

    If your model is stable and working well, you could also try -sp (for single precision version)

Answers

  • PaulAltair
    PaulAltair
    Altair Employee
    edited April 2022 Answer ✓

    The error is not as a result of the number of processes used, it is just stating the job failed, you can check in your 0001.out file for why

    For speed up, np is number of processes, nt is threads per process, Radioss uses both, so your total core usage will be np x nt

    if using -np, a rule of thumb is that you should have minimum approx 5k elements per process to see good speedup

    So in your case, if your model is smaller than 80k elements it might be better to use a hybrid combination 

    In your case you have 16 cores (multithreading won't help you)

    So you could use -np 8 -nt 2, -np 4 -nt 4, -np 2 -nt 8, or just -nt 16

    And see which works best for you

    If your model is stable and working well, you could also try -sp (for single precision version)

  • Neeraj Kumar
    Neeraj Kumar Altair Community Member
    edited April 2022

    The error is not as a result of the number of processes used, it is just stating the job failed, you can check in your 0001.out file for why

    For speed up, np is number of processes, nt is threads per process, Radioss uses both, so your total core usage will be np x nt

    if using -np, a rule of thumb is that you should have minimum approx 5k elements per process to see good speedup

    So in your case, if your model is smaller than 80k elements it might be better to use a hybrid combination 

    In your case you have 16 cores (multithreading won't help you)

    So you could use -np 8 -nt 2, -np 4 -nt 4, -np 2 -nt 8, or just -nt 16

    And see which works best for you

    If your model is stable and working well, you could also try -sp (for single precision version)

    I ran the model with -np=16 setting, but again it suddenly failed mid-way. Also, in .out file there is no error. I am attaching .out file and Altair compute console error message, please go through it. However, in the solver view panel, it is showing "Segmentation Violation".

  • PaulAltair
    PaulAltair
    Altair Employee
    edited April 2022

    I ran the model with -np=16 setting, but again it suddenly failed mid-way. Also, in .out file there is no error. I am attaching .out file and Altair compute console error message, please go through it. However, in the solver view panel, it is showing "Segmentation Violation".

    It isn't possible to know what is causing that unfortunately, segmentation errors don't give any information about what caused them. All we know is that something went wrong, it could be purely model based, or a bug in Radioss triggered by something (e.g. memory handling error). If you have restart files, You could try restarting it to see if it crashes in the same place (they would have extensions '0001_0001.rst' to '0001_0016.rst'), you can copy your jobname_0001.rad file and rename it jobname_0002.rad and submit that 0002.rad in ACC (would also need to be run -np 16). You could also try running it again from the start with -np 8, -nt 2 to see if it behaved any differently but it may not help. Finally, the 2022 release is available now, you could try that, if it is a Radioss bug, maybe it got fixed. Really, you have reached the point where you need to share the model with someone to have a better look. I understand you can't share it here, but if you contact Altair Support, they can take a look for you either by sharing or web meeting?