Problems with Radioss in HPc Cluster

pohan
pohan Altair Community Member
edited October 2020 in Community Q&A

Hello 

I have just started using Radioss 2017 in HPc Cluster. For the SLURM File I use the syntax like that

 #!/bin/bash #SBATCH --nodes=1 #SBATCH --ntasks-per-node=1 #SBATCH --time=01:00:00 #SBATCH --job-name=altair-radioss #SBATCH --mem=1G   # Setup input file(s) following your case INPUT1=v14_run_0000.rad INPUT2=v14_run_0001.rad   # Setup custom library path if necessary export LD_LIBRARY_PATH='$HOME/my_user_library:$LD_LIBRARY_PATH'   # Load Altair module module load altair/2017.2   # Run Radioss on single core if [[ ($SLURM_JOB_NUM_NODES == 1) && ($SLURM_NTASKS_PER_NODE == 1) ]]; then   radioss '$INPUT1'   # Run Radioss on many core elif [[ $$SLURM_JOB_NUM_NODES == 1 ]]; then   radioss -nthread $SLURM_NTASKS_PER_NODE '$INPUT1'   # Run Radioss on many core of many nodes else   module load intel/2013/intel-mpi   $ALTAIR_HOME/hwsolvers/radioss/bin/linux64/s_2017.2_linux64 -i '$INPUT1' -nt 1 -np $SLURM_JOB_NUM_NODES   mpiexec $ALTAIR_HOME/hwsolvers/radioss/bin/linux64/e_2017.2_linux64_impi -i '$INPUT2' -nt 1 fi

It worked. But when I changed 

#SBATCH --nodes=2
#SBATCH --ntasks-per-node=2

It did not work.

Could you please explain me why?

Thank you

Tagged:

Answers

  • Altair Forum User
    Altair Forum User
    Altair Employee
    edited November 2017

    Hi,

    This could due to the system limitations, as you are trying to invoke more cpus.

    I recommend you to contact with the Slurm support team for a clearer explanation.

  • pohan
    pohan Altair Community Member
    edited November 2017

    Thank George.

    Could you please give me the contact detail of the Slurm team?

  • Altair Forum User
    Altair Forum User
    Altair Employee
    edited November 2017

    Sorry Pohan, Slurm is not an Altair product. Please check for their support team in online web resources.

     

  • Andy_20955
    Andy_20955 New Altair Community Member
    edited November 2017

    Hi Pohan,

    I would suggest you verify that your radioss mpi submission command is working correctly outside of Slurm.

    First, I would recommend using the Intel MPI that ships with RADIOSS instead of another one that is installed on your system.  It is located in,

     

    altair/hw/2017/altair/mpi/linux64/intel-mpi/bin/

     

    I think this line needs a '-n $SLURM_JOB_NUM_NODES' like this

     mpiexec  -n $SLURM_JOB_NUM_NODES $ALTAIR_HOME/hwsolvers/radioss/bin/linux64/e_2017.2_linux64_impi -i '$INPUT2' -nt 1 

     

    Please download the HyperWorks advanced installation guide pdf from Altair connect about other environment variables suggested for running RADIOSS with the Intel MPI. 

     

    Alternatively, you can use the script located in scripts/radioss that ships with RADIOSS and then you don't need to call mpiexec but instead the command would look something like this, where -np is the number of MPI domains.


    The documentation for this radioss script in in the RADIOSS help under,      RADIOSS, User Guide Run, Options

     /altair/hw/2017/altair/scripts/radioss -v 2017.2.1 modelinput_0000.rad -mpi i -np 48 -nt 1 -hostfile /var/spool/PBS/aux/50494.admin -mpiargs -genv KMP_AFFINITY=scatter -genv I_MPI_PIN_DOMAIN=auto -genv I_MPI_ADJUST_BCAST=1 -genv I_MPI_ADJUST_REDUCE=2 -genv I_MPI_MPIRUN_CLEANUP=1 -genv KMP_STACKSIZE=400m -genv I_MPI_FABRICS=shm:dapl -noh3d 

     

  • pohan
    pohan Altair Community Member
    edited December 2017

    Hi Pohan,

    I would suggest you verify that your radioss mpi submission command is working correctly outside of Slurm.

    First, I would recommend using the Intel MPI that ships with RADIOSS instead of another one that is installed on your system.  It is located in,

     

    altair/hw/2017/altair/mpi/linux64/intel-mpi/bin/

     

    I think this line needs a '-n $SLURM_JOB_NUM_NODES' like this

      mpiexec  -n $SLURM_JOB_NUM_NODES $ALTAIR_HOME/hwsolvers/radioss/bin/linux64/e_2017.2_linux64_impi -i '$INPUT2' -nt 1 

     

    Please download the HyperWorks advanced installation guide pdf from Altair connect about other environment variables suggested for running RADIOSS with the Intel MPI. 

     

    Alternatively, you can use the script located in scripts/radioss that ships with RADIOSS and then you don't need to call mpiexec but instead the command would look something like this, where -np is the number of MPI domains.


    The documentation for this radioss script in in the RADIOSS help under,      RADIOSS, User Guide Run, Options

      /altair/hw/2017/altair/scripts/radioss -v 2017.2.1 modelinput_0000.rad -mpi i -np 48 -nt 1 -hostfile /var/spool/PBS/aux/50494.admin -mpiargs -genv KMP_AFFINITY=scatter -genv I_MPI_PIN_DOMAIN=auto -genv I_MPI_ADJUST_BCAST=1 -genv I_MPI_ADJUST_REDUCE=2 -genv I_MPI_MPIRUN_CLEANUP=1 -genv KMP_STACKSIZE=400m -genv I_MPI_FABRICS=shm:dapl -noh3d 

     

    Hello I tried to modify the slurm file as your suggestion but again it not work. The error is as below

    /share/applications/altair/2017.2/altair/hwsolvers/radioss/bin/linux64/e_2017.2_linux64_impi: error while loading shared libraries: libmpi.so.12: cannot open shared object file: No such file or directory

    ===================================================================================
    =   BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
    =   PID 108268 RUNNING AT styx-06-17
    =   EXIT CODE: 127
    =   CLEANING UP REMAINING PROCESSES
    =   YOU CAN IGNORE THE BELOW CLEANUP MESSAGES
    ===================================================================================
     

     

  • Andy_20955
    Andy_20955 New Altair Community Member
    edited December 2017

    Hello I tried to modify the slurm file as your suggestion but again it not work. The error is as below

    /share/applications/altair/2017.2/altair/hwsolvers/radioss/bin/linux64/e_2017.2_linux64_impi: error while loading shared libraries: libmpi.so.12: cannot open shared object file: No such file or directory

    ===================================================================================
    =   BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
    =   PID 108268 RUNNING AT styx-06-17
    =   EXIT CODE: 127
    =   CLEANING UP REMAINING PROCESSES
    =   YOU CAN IGNORE THE BELOW CLEANUP MESSAGES
    ===================================================================================
     

    You first need to verify that you can run RADIOSS outside of slurm. 

     

    Since you are running the executable directly, I would assume this is caused by an environment variable not being set correctly.  The HyperWorks advanced installation guide pdf from Altair connect about other environment variables suggested for running RADIOSS with the Intel MPI.

     

    But I would recommend you just use the radioss script that is included with HyperWorks since it sets all those environment variables for you.  First try running on one node using without slurm using,

     /altair/hw/2017/altair/scripts/radioss -v 2017.2.1 modelinput_0000.rad -mpi i -np 8 -nt 1 -mpiargs -genv KMP_AFFINITY=scatter -genv I_MPI_PIN_DOMAIN=auto -genv I_MPI_ADJUST_BCAST=1 -genv I_MPI_ADJUST_REDUCE=2 -genv I_MPI_MPIRUN_CLEANUP=1 -genv KMP_STACKSIZE=400m -genv I_MPI_FABRICS=shm:dapl

    then make a host file which has format, node:#cores like this,

    more hostfile

    node1:16

    node2:16

     

    Then try on two nodes using,

     /altair/hw/2017/altair/scripts/radioss -v 2017.2.1 modelinput_0000.rad -mpi i -np 48 -nt 1 -hostfile hostfile -mpiargs -genv KMP_AFFINITY=scatter -genv I_MPI_PIN_DOMAIN=auto -genv I_MPI_ADJUST_BCAST=1 -genv I_MPI_ADJUST_REDUCE=2 -genv I_MPI_MPIRUN_CLEANUP=1 -genv KMP_STACKSIZE=400m -genv I_MPI_FABRICS=shm:dapl

     

     

     

     

  • pohan
    pohan Altair Community Member
    edited December 2017

    Thank you Andy

    Now it work with one node and many cores but for many nodes and many cores it does not work.

    I also found this syntax for Intel Cluster

     [radioss@host1~]$ cp $ALTAIR_HOME/hwsolvers/common/bin/linux64/radflex_2017_linux64 [radioss@host1~]$ $ALTAIR_HOME/hwsolvers/radioss/bin/linux64/s_2017_linux64 –input  [ROOTNAME]_0000.rad –np [Nspmd] [radioss@host1~]$ [Intel MPI path]/bin/mpirun -configfile [cgfile]

    Could you tell me more about the [Intel MPI path] because I use HPC cluster so I do not know the path of Intel MPI.

  • Andy_20955
    Andy_20955 New Altair Community Member
    edited December 2017

    Hi,

    It would be much easier if you use the script that comes with HyperWorks to launch RADIOSS.  Please try this the script then you don't need to do anything about the Intel MPI path as the script sets that for you. 

     

     /altair/hw/2017/altair/scripts/radioss -v 2017.2.1 modelinput_0000.rad -mpi i -np 48 -nt 1 -hostfile hostfile -mpiargs -genv KMP_AFFINITY=scatter -genv I_MPI_PIN_DOMAIN=auto -genv I_MPI_ADJUST_BCAST=1 -genv I_MPI_ADJUST_REDUCE=2 -genv I_MPI_MPIRUN_CLEANUP=1 -genv KMP_STACKSIZE=400m -genv I_MPI_FABRICS=shm:dapl

     

    with a hostfile that contains,

    more hostfile

    node1:16

    node2:16

     

    Please send the output for this command.

     

    A few more things,

    Can you ssh between the nodes without entering a password?

     

    Last, the HyperWorks installation should be accessible to all nodes in the same location by installing on a shared drive or locally in the same place on each node.