Altair Radioss Parallel Computing Methods

Pierre-Christophe Masson_20773
edited December 2023 in Altair HyperWorks

Altair Radioss is leading code in scalability; its efficiency is demonstrated up to several thousands of cores.

Altair Radioss proposes 3 different parallel computing methods:

  • Shared-Memory Parallelism (SMP).
  • Single Program Multiple Data (SPMD) or Massively Parallel Program (MPP).
  • Hybrid Massively Parallel Program (HMPP) which is a combination of the 2 above methods.

This article presents the different approaches and gives best practices for Radioss performances.

Altair Radioss Parallel Computing Methods

SMP method: there is 1 process and the computation is done using all the specified cores (often called threads in this case).

Altair Compute Console option or Command Line Argument: -nt <NumThreads>

SPMD (MPP) method: the model is split in as many processes domains as specified cores; each domain is solved using 1 process. A Message Passing Interface (MPI) software is used to handle communication between domains. Each process uses 1 thread.

Altair Compute Console option or Command Line Argument: -np <NumDomains>

HMPP method: the model is split into separate domains and each domain is solved using a given number of threads. Since there are less domains compared to SPMD method, it reduces the amount of communication and so it can be useful when using large number of cores or when network between servers is slow.

Altair Compute Console option or Command Line Argument:

-np <NumDomains> -nt <NumThreads>

 

When to use SMP method (-nt #threads)

SMP method is well suited:

  • For machines with limited number of (physical) cores, that is to say mainly computation on laptops.
  • For small models (less than 10.000 elements).

This parallelization method is efficient up to 4 cores; it is not recommended with more than 8 cores.

 

When to use SPMD (MPP) method (-np #cores)

SPMD method is well suited:

  • For machines with high number of (physical) cores (up to 256 - 512 cores)
  • For large models

It is recommended, for standard crash applications, to have ideally at least 10.000 elements per domain (minimum 5000) using SPMD method; below that, it is recommended to use HMPP method.

SPMD method is generally the best up to 256 cores.

Remark: For FSI (ALE or Euler) or SPH simulations, there are no such recommendations; one will have to perform checks on the models.

 

When to use Hybrid (HMPP) method (-np #cores -nt threads)

HMPP method is recommended:

  • With very high number of cores (512+); in these case, 2 or 4 threads per domain should be used.
  • in this case, it is possible to specify 1 SPMD domain per node and a number of threads per domain equal to the number of (physical) cores per socket.
  • When the number of elements of the model per SPMD domain (core) becomes lower than 10.000.

Remark: Finite Volume Method (FVM) benefits from using 4 threads.

 

 General Recommendations

Avoid hyperthreading and considering logical cores.

Consider physical cores only.

Product of number of domains (-np) and number of threads (-nt) should correspond to the number of available physical cores (and to the number of cores one wants to use).

Each node should be dedicated to a single job

If one may see smaller computation time submitting a job on 2 servers using half of available cores on each server (let’s say 16 cores for 32-core server) than submitting the same job on 1 server using all the cores (32 cores), both servers will then be only half busy, so a compromise should be made there between computation time and servers’ occupation. Especially, it is more efficient in this case to submit 2 jobs in parallel on each server than submitting them sequentially on the 2 servers.

For submission over several servers, ensure that the quality of connections between them is high enough.

Use affinity (environment variable KMP_AFFINITY) to stick MPI processes to cores.

 

Examples

Example 1: if you have 2 sockets with 16 physical cores, use maximum 32 cores per server

Example 2: for a model of 150.000 elements and a simulation on 1 server, use options -np 16 -nt 2 (or -np 8 -nt 4) to have ~10.000 elements per domain.

 

 

For more information about the parallelization methods, please refer to the help page below

https://help.altair.com/hwsolvers/rad/topics/solvers/rad/theory_radioss_parallelization_c.htm