Problems with Intel MPI Feko execution for some hardware configurations.

Iakov Zhabitskiy
Iakov Zhabitskiy
Altair Employee
edited December 2022 in Altair HyperWorks

From Feko version 2022 a newer Intel MPI library 2021.2 is packaged instead of previous 2018.0.

The 2021.2 IMPI version has an advanced mechanism for automatic identification of available hardware and fabrics. It allows to achieve the best possible performance for each specific hardware configuration.

Some side effects have been observed after this change:

  1. Feko does not start and there are MPI library errors about UCX and/or fabrics.
  2. Feko hangs during execution (usually during LU factorisation computation, but at some other stages of computation it can happen as well). Observed only for Linux clusters with InfiniBand.
  3. Feko simultation (over multiple frequencies) runs out of memory during the `Backward substitution for FEM coupling matrix` phase. Observed for both Windows and Linux systems on some hardware configurations.

The first problem can be fixed by updating fabrics related drivers: new IMPI version relies on some mechanisms which are not available in old driver versions.

The second and third problems have different root causes, but can be avoided by setting environment variable I_MPI_FABRICS=ofi.

Important note: do not set the I_MPI_FABRICS environment variable by default, since it can cause bad performance of parallel Feko runs.