HPC Optistruct 2020.1
I try to run a parallel simulation of optistruct on an linux cluster:
28-way Haswell-EP nodes with Infiniband FDR14 interconnect and 2 hardware threads per physical core
I try to start a simulation on 3 nodes with 28 cores each. Our System Administrator uses SLURM and SBATCH. Therefore I wrote an sbatch script:
#!/bin/bash
#SBATCH -J Phantom
#SBATCH -o ./%x.%j.%N.out
#SBATCH -D ./
#SBATCH --get-user-env
#SBATCH --clusters=cm2
#SBATCH --partition=cm2_std
#SBATCH --qos=cm2_std
#SBATCH --nodes=3
#SBATCH --ntasks-per-node=28
#SBATCH --mail-type=end
#SBATCH --export=NONE
#SBATCH --time=8:00:00
module load slurm_setup
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$HOME/2018/altair/hwsolvers/common/bin/linux64/
mpiexec -n $SLURM_NTASKS ~/2020/altair/hwsolvers/optistruct/bin/linux64/optistruct_2020.1_linux64_impi ./Phantom_v2.fem -ddmmode
Which starts optistruct as a solver, but stops working with an error message saying:
===================================================================================
= BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
= RANK 28 PID 51302 RUNNING AT i22r04c03s07
= KILLED BY SIGNAL: 9 (Killed)
===================================================================================
and so on.
Does anybody know where my source of error is? Or which problems can be faced?
Many thanks in advance
Christian
Answers
-
Within your script I see "2018" and "2020",...etc Do you mixte two different version of Optistruct?
The executable "mpiexec" from your command comes from Altair installation ?
At your place, I would like to test the mode SMP on only node (-cpu 10 for example)
0 -
Hi thanks for your answer,
The mpiexec uses the mpi installation on the linux cluster. I think this is needed, because the Linux Cluster is running a SLURM BATCH. The 2020 was a typo. There is also an older installation, but it should just use the 2020 version.
The serial run is working but no parallelization possible so far .
Thanks in advance,
Christian
0 -
Some time you can use "external" MPI (with their mpirun), but some time you have to use "internal" MPI which is shipped with software.
SMP is also one parallel technology on several core of same cpu or same node.
In Altair's doc I see some example with script, not executable in your script. Maybe try it?
0