HPC Optistruct 2020.1

Christian Fritz_22258
Christian Fritz_22258 New Altair Community Member
edited February 2021 in Community Q&A

I try to run a parallel simulation of optistruct on an linux cluster:

 

28-way Haswell-EP nodes with Infiniband FDR14 interconnect and 2 hardware threads per physical core

I try to start a simulation on 3 nodes with 28 cores each. Our System Administrator uses SLURM and SBATCH. Therefore I wrote an sbatch script:

#!/bin/bash
#SBATCH -J Phantom
#SBATCH -o ./%x.%j.%N.out
#SBATCH -D ./
#SBATCH --get-user-env
#SBATCH --clusters=cm2
#SBATCH --partition=cm2_std
#SBATCH --qos=cm2_std
#SBATCH --nodes=3
#SBATCH --ntasks-per-node=28
#SBATCH --mail-type=end
#SBATCH --export=NONE
#SBATCH --time=8:00:00

module load slurm_setup

export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$HOME/2018/altair/hwsolvers/common/bin/linux64/

mpiexec -n $SLURM_NTASKS ~/2020/altair/hwsolvers/optistruct/bin/linux64/optistruct_2020.1_linux64_impi ./Phantom_v2.fem -ddmmode

Which starts optistruct as a solver, but stops working with an error message saying:


===================================================================================
=   BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
=   RANK 28 PID 51302 RUNNING AT i22r04c03s07
=   KILLED BY SIGNAL: 9 (Killed)
===================================================================================

and so on.

Does anybody know where my source of error is? Or which problems can be faced?

Many thanks in advance

Christian

Answers

  • Q.Nguyen-Dai
    Q.Nguyen-Dai Altair Community Member
    edited February 2021

    Within your script I see "2018" and "2020",...etc Do you mixte two different version of Optistruct?

    The executable "mpiexec" from your command comes from Altair installation ?

    At your place, I would like to test the mode SMP on only node (-cpu 10 for example)

  • Christian Fritz_22258
    Christian Fritz_22258 New Altair Community Member
    edited February 2021

    Hi thanks for your answer,

    The mpiexec uses the mpi installation on the linux cluster. I think this is needed, because the Linux Cluster is running a SLURM BATCH. The 2020 was a typo. There is also an older installation, but it should just use the 2020 version.

    The serial run is working but no parallelization possible so far :(.

     

    Thanks in advance,

    Christian

  • Q.Nguyen-Dai
    Q.Nguyen-Dai Altair Community Member
    edited February 2021

    Some time you can use "external" MPI (with their mpirun), but some time you have to use "internal" MPI which is shipped with software.

    SMP is also one parallel technology on several core of same cpu or same node.

    In Altair's doc I see some example with script, not executable in your script. Maybe try it?