Running Radioss-SPMD Job

Michael_21313
Michael_21313 Altair Community Member
edited May 27 in Community Q&A

Hello, 

I'm trying to get Radioss 2023.1 to work for us, but I get the following error message by submitting a Test tutorial via Access 2021.1.2 / pbspro 2021.1.4. (RHEL 7.9)

start.py: INFO: Running command /cm/shared/apps/altair/2023.1/altair/scripts/radioss BIRD_WINDSHIELD_v1_0001.rad -onestep -mpi i -hostfile /cm/local/apps/pbspro/var/spool/aux/51563.serverxy -np 24 -nt 1 -v 2023.1 WARNING in hwsolver script execution:  Default environment variables for MPI run  are provided because none were set:     setenv KMP_AFFINITY scatter    setenv I_MPI_PIN_DOMAIN auto    setenv I_MPI_ADJUST_BCAST 1    setenv I_MPI_ADJUST_REDUCE 2 This setup is often adequate, however use of hardware specific values may allow to tune for better performance. *************************************  [mpiexec@cn001] Launch arguments: /cm/shared/apps/altair/2023.1/altair/mpi/linux64/intel-mpi/bin//hydra_bstrap_proxy --upstream-host cn001.cm.cluster --upstream-port 36138 --pgid 0 --launcher ssh --launcher-number 0 --launcher-exec /cm/shared/apps/pbspro/current/bin/pbs_tmrsh --base-path /cm/shared/apps/altair/2023.1/altair/mpi/linux64/intel-mpi/bin/ --tree-width 16 --tree-level 1 --time-left -1 --launch-type 2 --debug --proxy-id 0 --node-id 0 --subtree-size 1 --upstream-fd 9 /cm/shared/apps/altair/2023.1/altair/mpi/linux64/intel-mpi/bin//hydra_pmi_proxy --usize -1 --auto-cleanup 1 --abort-signal 9  [mpiexec@cn001] Launch arguments: /cm/shared/apps/pbspro/current/bin/pbs_tmrsh -q -x cn005.cm.cluster /cm/shared/apps/altair/2023.1/altair/mpi/linux64/intel-mpi/bin//hydra_bstrap_proxy --upstream-host cn001.cm.cluster --upstream-port 36138 --pgid 0 --launcher ssh --launcher-number 0 --launcher-exec /cm/shared/apps/pbspro/current/bin/pbs_tmrsh --base-path /cm/shared/apps/altair/2023.1/altair/mpi/linux64/intel-mpi/bin/ --tree-width 16 --tree-level 1 --time-left -1 --launch-type 2 --debug --proxy-id 1 --node-id 1 --subtree-size 1 /cm/shared/apps/altair/2023.1/altair/mpi/linux64/intel-mpi/bin//hydra_pmi_proxy --usize -1 --auto-cleanup 1 --abort-signal 9  SPMD RD solver run ({/cm/shared/apps/altair/2023.1/altair/mpi/linux64/intel-mpi/bin/mpirun} -bootstrap ssh -envall -machinefile /cm/local/apps/pbspro/var/spool/aux/51563.serverxy -n 24 /cm/shared/apps/altair/2023.1/altair/hwsolvers/radioss/bin/linux64/e_2023.1_linux64_impi  -i {BIRD_WINDSHIELD_v1_0001.rad} -nt 1) crashed StdErr output from the solver: ============================== [mpiexec@cn001] check_exit_codes (../../../../../src/pm/i_hydra/libhydra/demux/hydra_demux_poll.c:117): unable to run bstrap_proxy on cn005.cm.cluster (pid 110057, exit code 65280) [mpiexec@cn001] poll_for_event (../../../../../src/pm/i_hydra/libhydra/demux/hydra_demux_poll.c:159): check exit codes error [mpiexec@cn001] HYD_dmx_poll_wait_for_proxy_event (../../../../../src/pm/i_hydra/libhydra/demux/hydra_demux_poll.c:212): poll for event error [mpiexec@cn001] HYD_bstrap_setup (../../../../../src/pm/i_hydra/libhydra/bstrap/src/intel/i_hydra_bstrap.c:1065): error waiting for event [mpiexec@cn001] HYD_print_bstrap_setup_error_message (../../../../../src/pm/i_hydra/mpiexec/intel/i_mpiexec.c:1026): error setting up the bootstrap proxies [mpiexec@cn001] Possible reasons: [mpiexec@cn001] 1. Host is unavailable. Please check that all hosts are available. [mpiexec@cn001] 2. Cannot launch hydra_bstrap_proxy or it crashed on one of the hosts. Make sure hydra_bstrap_proxy is available on all hosts and it has right permissions. [mpiexec@cn001] 3. Firewall refused connection. Check that enough ports are allowed in the firewall and specify them with the I_MPI_PORT_RANGE variable. [mpiexec@cn001] 4. Ssh bootstrap cannot launch processes on remote host. Make sure that passwordless ssh connection is established across compute hosts. [mpiexec@cn001]    You may try using -bootstrap option to select alternative launcher.

If I try to launch (by commandline) starter file ; deploy rst /incl files multiple nodes ; run engine file everything is fine. 

If I try to launch the above mpirun command -bootstrap ssh (by commandline) it also works.  

I also tried to use the following parameters. 

os.environ['I_MPI_DEBUG'] = '6' os.environ['I_MPI_HYDRA_IFACE'] = 'ib0' os.environ['I_MPI_HYDRA_BOOTSTRP_EXEC_EXTRA_ARGS'] = '-disable-x' os.environ['I_MPI_COLL_EXTERNAL'] = '0' os.environ['I_MPI_ADJUST_GATHERV'] = '3' os.environ['I_MPI_PORT_RANGE'] = '20000:30000'

ssh passwordless is configured. Previous versions of Radioss like 2022.2 is working fine with both parameters.

os.environ['I_MPI_HYDRA_BOOTSTRAP'] = 'rsh' os.environ['I_MPI_HYDRA_BOOTSTRAP_EXEC'] = '/cm/shared/apps/pbspro/current/bin/pbs_tmrsh' 

the command ./mpirun -host cn002.cm.cluster cpuinfo command from the installation path of Hyperworks 2023.1 works.

Do you have similar experience?

Kind regards, 

Michael

 

 

Answers

  • Olivier Wienholtz Altair
    Olivier Wienholtz Altair
    Altair Employee
    edited May 21

    Hi Michael,

    It looks like the node  cn005.cm.cluster could not be accessed.

    The Intel MPI error message indicates this.

    Can you pls have a look at your hostfile : 

    /cm/local/apps/pbspro/var/spool/aux/51563.serverxy

    And verify if any node is accessible.

     

    Best Regards,

    Olivier W.

     

     

  • Michael_21313
    Michael_21313 Altair Community Member
    edited May 21

    Hi Olivier. 

    This is a different example. Submitted 1 job, requested 2 chunks with each 12 Cores.

    This is how it looks like.

    cn011.cm.cluster cn011.cm.cluster cn011.cm.cluster cn011.cm.cluster cn011.cm.cluster cn011.cm.cluster cn011.cm.cluster cn011.cm.cluster cn011.cm.cluster cn011.cm.cluster cn011.cm.cluster cn011.cm.cluster cn006.cm.cluster cn006.cm.cluster cn006.cm.cluster cn006.cm.cluster cn006.cm.cluster cn006.cm.cluster cn006.cm.cluster cn006.cm.cluster cn006.cm.cluster cn006.cm.cluster cn006.cm.cluster cn006.cm.cluster  

    With version 2022.2 Radioss everything works as expected. 

    Kind regards, 

    Michael

     

     

     

     

  • Olivier Wienholtz Altair
    Olivier Wienholtz Altair
    Altair Employee
    edited May 23

    Hello,

    There was an IntelMPI upgrade in 2023.1 compared to 2022.2

    We had to force the bootstrap to ssh.

    This error indictates that the first commpute  node which launches mpi was not able to reach an other node in the list with ssh.

     

  • Michael_21313
    Michael_21313 Altair Community Member
    edited May 27

    Hello,

    There was an IntelMPI upgrade in 2023.1 compared to 2022.2

    We had to force the bootstrap to ssh.

    This error indictates that the first commpute  node which launches mpi was not able to reach an other node in the list with ssh.

     

    Hello Olivier. 

    Thanks for your response. I have seen that in the manual guide and went through the troubleshooting section of Intel. Luckily, it is possible to initate a Radioss-SPMD job via shell but not through Altair Access via job scheduler pbspro.

    This is how it looks like. 

    start.py: INFO: Stack size limit is (-0.0009765625,-0.0009765625). start.py: INFO: Max locked memory limit is (-0.0009765625,-0.0009765625). start.py: INFO: Unlimit stack size start.py: INFO: Stack size limit is (-0.0009765625,-0.0009765625). start.py: INFO: Unlimit max locked memory size start.py: INFO: Max locked memory limit is (-0.0009765625,-0.0009765625). start.py: INFO hostfile is set to /cm/local/apps/pbspro/var/spool/aux/51840.denueslhpcapp01 Primary node = cn001.cm.cluster Remote nodes = cn014.cm.cluster Cores on primary node = 2 Radioss basename = BIRD_WINDSHIELD_v1 Radioss number = 0000 start.py: INFO: Running command /cm/shared/apps/altair/2023.1/altair/scripts/radioss BIRD_WINDSHIELD_v1_0000.rad -starter -np 4 -nt 1  -v 2023.1 ************************************************************************ **                                                                    ** **                                                                    ** **                 Altair Radioss(TM) Starter 2023.1                  ** **                                                                    ** **            Non-linear Finite Element Analysis Software             ** **                   from Altair Engineering, Inc.                    ** **                                                                    ** **                                                                    ** **                   Linux 64 bits, Intel compiler                    ** **                                                                    ** **                                                                    ** **                                                                    ** ** Build tag: 1192156_1263820231_1060_0101891                         ** ************************************************************************ **  COPYRIGHT (C) 1986-2023                 Altair Engineering, Inc.  ** ** All Rights Reserved.  Copyright notice does not imply publication. ** ** Contains trade secrets of Altair Engineering Inc.                  ** ** Decompilation or disassembly of this software strictly prohibited. ** ************************************************************************     .. UNITS SYSTEM                                                                           .. CONTROL VARIABLES                                                                      .. STARTER RUNNING ON    1 THREAD  .. FUNCTIONS & TABLES  .. MATERIALS                                                                              .. NODES                                                                                  .. SUBMODELS  .. PROPERTIES                                                                             .. 3D SHELL ELEMENTS                                                                      .. 3D BEAM ELEMENTS                                                                       .. 3D SPRING ELEMENTS                                                                     .. 3D TRIANGULAR SHELL ELEMENTS                                                           .. SPH PARTICLES DEFINITION  .. SUBSETS  .. ELEMENT GROUPS  .. PART GROUPS  .. SURFACES   .. NODE GROUP  .. BOUNDARY CONDITIONS                                                                    .. INITIAL VELOCITIES                                                                     .. DOMAIN DECOMPOSITION  .. ELEMENT GROUPS                                                                         .. INTERFACES                                                                             .. INTERFACE BUFFER INITIALIZATION                                                        .. RIGID BODIES                                                                           .. RETURNS TO DOMAIN DECOMPOSITION FOR OPTIMIZATION  .. DOMAIN DECOMPOSITION  .. ELEMENT GROUPS                                                                         .. INTERFACES                                                                             .. INTERFACE BUFFER INITIALIZATION                                                        WARNING ID :    343 ** WARNING: INITIAL PENETRATIONS IN INTERFACE  .. RIGID BODIES                                                                           .. ELEMENT BUFFER INITIALIZATION                                                          .. GEOMETRY PLOT FILE                                                                     .. PARALLEL RESTART FILES GENERATION                                                        ------------------------------------------------------------------------                       ** COMPUTE TIME INFORMATION **     EXECUTION STARTED      :      2024/05/26  18:27:39  EXECUTION COMPLETED    :      2024/05/26  18:27:42     ELAPSED TIME...........=          3.32 s                                00:00:03    ------------------------------------------------------------------------         TERMINATION WITH WARNING                 ------------------                                                                                 0 ERROR(S)                           1 WARNING(S)               PLEASE CHECK LISTING FILE FOR FURTHER DETAILS   Terminating run because -onestep option is set  WARNING in hwsolver script execution:  No anim files created *************************************  Radioss:: Solver run finished.   Engine files = BIRD_WINDSHIELD_v1_0001.rad Restart files to distribute = BIRD_WINDSHIELD_v1_0000_0001.rst, BIRD_WINDSHIELD_v1_0000_0002.rst, BIRD_WINDSHIELD_v1_0000_0003.rst, BIRD_WINDSHIELD_v1_0000_0004.rst Run cmd /usr/bin/scp BIRD_WINDSHIELD_v1_0001.rad cn014.cm.cluster:/scratch/pbs.51840.denueslhpcapp01.x8z/BIRD_WINDSHIELD_v1_0001.rad Run cmd /usr/bin/scp BIRD_WINDSHIELD_v1_0000_0001.rst cn014.cm.cluster:/scratch/pbs.51840.denueslhpcapp01.x8z/BIRD_WINDSHIELD_v1_0000_0001.rst Run cmd /usr/bin/scp BIRD_WINDSHIELD_v1_0000_0002.rst cn014.cm.cluster:/scratch/pbs.51840.denueslhpcapp01.x8z/BIRD_WINDSHIELD_v1_0000_0002.rst Run cmd /usr/bin/scp BIRD_WINDSHIELD_v1_0000_0003.rst cn014.cm.cluster:/scratch/pbs.51840.denueslhpcapp01.x8z/BIRD_WINDSHIELD_v1_0000_0003.rst Run cmd /usr/bin/scp BIRD_WINDSHIELD_v1_0000_0004.rst cn014.cm.cluster:/scratch/pbs.51840.denueslhpcapp01.x8z/BIRD_WINDSHIELD_v1_0000_0004.rst start.py: INFO: Running command /cm/shared/apps/altair/2023.1/altair/scripts/radioss BIRD_WINDSHIELD_v1_0001.rad -onestep -mpi i -hostfile /cm/local/apps/pbspro/var/spool/aux/51840.denueslhpcapp01 -np 4 -nt 1 -v 2023.1 WARNING in hwsolver script execution:  Default environment variables for MPI run  are provided because none were set:     setenv KMP_AFFINITY scatter    setenv I_MPI_PIN_DOMAIN auto    setenv I_MPI_ADJUST_BCAST 1    setenv I_MPI_ADJUST_REDUCE 2 This setup is often adequate, however use of hardware specific values may allow to tune for better performance. *************************************  [mpiexec@cn001] Launch arguments: /cm/shared/apps/altair/2023.1/altair/mpi/linux64/intel-mpi/bin//hydra_bstrap_proxy --upstream-host cn001.cm.cluster --upstream-port 40878 --pgid 0 --launcher ssh --launcher-number 0 --base-path /cm/shared/apps/altair/2023.1/altair/mpi/linux64/intel-mpi/bin/ --tree-width 16 --tree-level 1 --time-left -1 --launch-type 2 --debug --proxy-id 0 --node-id 0 --subtree-size 1 --upstream-fd 9 /cm/shared/apps/altair/2023.1/altair/mpi/linux64/intel-mpi/bin//hydra_pmi_proxy --usize -1 --auto-cleanup 1 --abort-signal 9  [mpiexec@cn001] Launch arguments: /bin/ssh -q -x cn014.cm.cluster /cm/shared/apps/altair/2023.1/altair/mpi/linux64/intel-mpi/bin//hydra_bstrap_proxy --upstream-host cn001.cm.cluster --upstream-port 40878 --pgid 0 --launcher ssh --launcher-number 0 --base-path /cm/shared/apps/altair/2023.1/altair/mpi/linux64/intel-mpi/bin/ --tree-width 16 --tree-level 1 --time-left -1 --launch-type 2 --debug --proxy-id 1 --node-id 1 --subtree-size 1 /cm/shared/apps/altair/2023.1/altair/mpi/linux64/intel-mpi/bin//hydra_pmi_proxy --usize -1 --auto-cleanup 1 --abort-signal 9  [proxy:0:1@cn014] pmi cmd from fd 4: cmd=init pmi_version=1 pmi_subversion=1 [proxy:0:1@cn014] PMI response: cmd=response_to_init pmi_version=1 pmi_subversion=1 rc=0 [proxy:0:1@cn014] pmi cmd from fd 4: cmd=get_maxes [proxy:0:1@cn014] PMI response: cmd=maxes kvsname_max=256 keylen_max=64 vallen_max=4096 [proxy:0:1@cn014] pmi cmd from fd 4: cmd=get_appnum [proxy:0:1@cn014] PMI response: cmd=appnum appnum=0 [proxy:0:1@cn014] pmi cmd from fd 4: cmd=get_my_kvsname [proxy:0:1@cn014] PMI response: cmd=my_kvsname kvsname=kvs_219433_0 [proxy:0:1@cn014] pmi cmd from fd 4: cmd=get kvsname=kvs_219433_0 key=PMI_process_mapping [proxy:0:1@cn014] PMI response: cmd=get_result rc=0 msg=success value=(vector,(0,2,2)) [proxy:0:1@cn014] pmi cmd from fd 5: cmd=init pmi_version=1 pmi_subversion=1 [proxy:0:1@cn014] PMI response: cmd=response_to_init pmi_version=1 pmi_subversion=1 rc=0 [proxy:0:1@cn014] pmi cmd from fd 5: cmd=get_maxes [proxy:0:1@cn014] PMI response: cmd=maxes kvsname_max=256 keylen_max=64 vallen_max=4096 [proxy:0:1@cn014] pmi cmd from fd 5: cmd=get_appnum [proxy:0:1@cn014] PMI response: cmd=appnum appnum=0 [proxy:0:1@cn014] pmi cmd from fd 5: cmd=get_my_kvsname [proxy:0:1@cn014] PMI response: cmd=my_kvsname kvsname=kvs_219433_0 [proxy:0:1@cn014] pmi cmd from fd 5: cmd=get kvsname=kvs_219433_0 key=PMI_process_mapping [proxy:0:1@cn014] PMI response: cmd=get_result rc=0 msg=success value=(vector,(0,2,2)) [proxy:0:1@cn014] pmi cmd from fd 5: cmd=barrier_in [proxy:0:1@cn014] pmi cmd from fd 4: cmd=put kvsname=kvs_219433_0 key=-bcast-1-2 value=2F6465762F73686D2F496E74656C5F4D50495F305255704A46 [proxy:0:1@cn014] PMI response: cmd=put_result rc=0 msg=success [proxy:0:1@cn014] pmi cmd from fd 4: cmd=barrier_in [proxy:0:0@cn001] pmi cmd from fd 8: cmd=init pmi_version=1 pmi_subversion=1 [proxy:0:0@cn001] PMI response: cmd=response_to_init pmi_version=1 pmi_subversion=1 rc=0 [proxy:0:0@cn001] pmi cmd from fd 11: cmd=init pmi_version=1 pmi_subversion=1 [proxy:0:0@cn001] PMI response: cmd=response_to_init pmi_version=1 pmi_subversion=1 rc=0 [proxy:0:0@cn001] pmi cmd from fd 8: cmd=get_maxes [proxy:0:0@cn001] PMI response: cmd=maxes kvsname_max=256 keylen_max=64 vallen_max=4096 [proxy:0:0@cn001] pmi cmd from fd 11: cmd=get_maxes [proxy:0:0@cn001] PMI response: cmd=maxes kvsname_max=256 keylen_max=64 vallen_max=4096 [proxy:0:0@cn001] pmi cmd from fd 11: cmd=get_appnum [proxy:0:0@cn001] PMI response: cmd=appnum appnum=0 [proxy:0:0@cn001] pmi cmd from fd 11: cmd=get_my_kvsname [proxy:0:0@cn001] PMI response: cmd=my_kvsname kvsname=kvs_219433_0 [proxy:0:0@cn001] pmi cmd from fd 11: cmd=get kvsname=kvs_219433_0 key=PMI_process_mapping [proxy:0:0@cn001] PMI response: cmd=get_result rc=0 msg=success value=(vector,(0,2,2)) [proxy:0:0@cn001] pmi cmd from fd 8: cmd=get_appnum [proxy:0:0@cn001] PMI response: cmd=appnum appnum=0 [proxy:0:0@cn001] pmi cmd from fd 8: cmd=get_my_kvsname [proxy:0:0@cn001] PMI response: cmd=my_kvsname kvsname=kvs_219433_0 [proxy:0:0@cn001] pmi cmd from fd 8: cmd=get kvsname=kvs_219433_0 key=PMI_process_mapping [proxy:0:0@cn001] PMI response: cmd=get_result rc=0 msg=success value=(vector,(0,2,2)) [0] MPI startup(): Intel(R) MPI Library, Version 2021.10  Build 20230619 (id: c2e19c2f3e) [0] MPI startup(): Copyright (C) 2003-2023 Intel Corporation.  All rights reserved. [0] MPI startup(): library kind: release [proxy:0:0@cn001] pmi cmd from fd 11: cmd=barrier_in [proxy:0:0@cn001] pmi cmd from fd 8: cmd=put kvsname=kvs_219433_0 key=-bcast-1-0 value=2F6465762F73686D2F496E74656C5F4D50495F6661545A3433 [proxy:0:0@cn001] PMI response: cmd=put_result rc=0 msg=success [proxy:0:0@cn001] pmi cmd from fd 8: cmd=barrier_in [proxy:0:0@cn001] PMI response: cmd=barrier_out [proxy:0:0@cn001] PMI response: cmd=barrier_out [proxy:0:0@cn001] pmi cmd from fd 11: cmd=get kvsname=kvs_219433_0 key=-bcast-1-0 [proxy:0:0@cn001] PMI response: cmd=get_result rc=0 msg=success value=2F6465762F73686D2F496E74656C5F4D50495F6661545A3433 [proxy:0:1@cn014] PMI response: cmd=barrier_out [proxy:0:1@cn014] PMI response: cmd=barrier_out [proxy:0:1@cn014] pmi cmd from fd 5: cmd=get kvsname=kvs_219433_0 key=-bcast-1-2 [proxy:0:1@cn014] PMI response: cmd=get_result rc=0 msg=success value=2F6465762F73686D2F496E74656C5F4D50495F305255704A46 [0] MPI startup(): libfabric version: 1.18.0-impi [proxy:0:1@cn014] pmi cmd from fd 5: cmd=barrier_in [proxy:0:1@cn014] pmi cmd from fd 4: cmd=put kvsname=kvs_219433_0 key=bc-2 value=mpi#03433F001500000000000000000080FECCED510003F6CEB80000000000000000$ [proxy:0:1@cn014] PMI response: cmd=put_result rc=0 msg=success [proxy:0:1@cn014] pmi cmd from fd 4: cmd=barrier_in [0] MPI startup(): libfabric provider: psm3 [proxy:0:0@cn001] pmi cmd from fd 8: cmd=put kvsname=kvs_219433_0 key=bc-0 value=mpi#03BEDE000900000000000000000080FEAC3A580003F6CEB80000000000000000$ [proxy:0:0@cn001] PMI response: cmd=put_result rc=0 msg=success [proxy:0:0@cn001] pmi cmd from fd 11: cmd=barrier_in [proxy:0:0@cn001] pmi cmd from fd 8: cmd=barrier_in [proxy:0:0@cn001] PMI response: cmd=barrier_out [proxy:0:0@cn001] PMI response: cmd=barrier_out [proxy:0:0@cn001] pmi cmd from fd 8: cmd=get kvsname=kvs_219433_0 key=bc-0 [proxy:0:0@cn001] PMI response: cmd=get_result rc=0 msg=success value=mpi#03BEDE000900000000000000000080FEAC3A580003F6CEB80000000000000000$ [proxy:0:0@cn001] pmi cmd from fd 11: cmd=get kvsname=kvs_219433_0 key=bc-2 [proxy:0:0@cn001] PMI response: cmd=get_result rc=0 msg=success value=mpi#03433F001500000000000000000080FECCED510003F6CEB80000000000000000$ [proxy:0:1@cn014] PMI response: cmd=barrier_out [proxy:0:1@cn014] PMI response: cmd=barrier_out [proxy:0:1@cn014] pmi cmd from fd 4: cmd=get kvsname=kvs_219433_0 key=bc-0 [proxy:0:1@cn014] PMI response: cmd=get_result rc=0 msg=success value=mpi#03BEDE000900000000000000000080FEAC3A580003F6CEB80000000000000000$ [proxy:0:1@cn014] pmi cmd from fd 5: cmd=get kvsname=kvs_219433_0 key=bc-2 [proxy:0:1@cn014] PMI response: cmd=get_result rc=0 msg=success value=mpi#03433F001500000000000000000080FECCED510003F6CEB80000000000000000$ [0] MPI startup(): File "/cm/shared/apps/altair/2023.1/altair/mpi/linux64/intel-mpi/etc/tuning_skx_shm-ofi_psm3_100.dat" not found [0] MPI startup(): Load tuning file: "/cm/shared/apps/altair/2023.1/altair/mpi/linux64/intel-mpi/etc/tuning_skx_shm-ofi_psm3.dat" [0] MPI startup(): Rank    Pid      Node name  Pin cpu [0] MPI startup(): 0       219438   cn001      {1} [0] MPI startup(): 1       219439   cn001      {5} [0] MPI startup(): 2       269221   cn014      {0,1,4,5,8,9,12,13,16,17,20,21,24,25,28,29,32,33,36,37,40,41,44,45} [0] MPI startup(): 3       269222   cn014      {2,3,6,7,10,11,14,15,18,19,22,23,26,27,30,31,34,35,38,39,42,43,46,47} [0] MPI startup(): I_MPI_DEBUG_OUTPUT=debug_output.txt [0] MPI startup(): I_MPI_ROOT=/cm/shared/apps/altair/2023.1/altair/mpi/linux64/intel-mpi [0] MPI startup(): I_MPI_MPIRUN=mpirun [0] MPI startup(): I_MPI_BIND_WIN_ALLOCATE=localalloc [0] MPI startup(): I_MPI_HYDRA_DEBUG=enable [0] MPI startup(): I_MPI_HYDRA_ENV=all [0] MPI startup(): I_MPI_HYDRA_RMK=pbs [0] MPI startup(): I_MPI_HYDRA_TOPOLIB=hwloc [0] MPI startup(): I_MPI_HYDRA_GDB_REMOTE_SHELL=ssh [0] MPI startup(): I_MPI_HYDRA_BOOTSTRAP=ssh [0] MPI startup(): I_MPI_RETURN_WIN_MEM_NUMA=1 [0] MPI startup(): I_MPI_ADJUST_BCAST=1 [0] MPI startup(): I_MPI_ADJUST_REDUCE=2 [0] MPI startup(): I_MPI_PIN_DOMAIN=auto [0] MPI startup(): I_MPI_INTERNAL_MEM_POLICY=default [0] MPI startup(): I_MPI_FABRICS=shm:ofi [0] MPI startup(): I_MPI_DEBUG=5 ************************************************************************ **                                                                    ** **                                                                    ** **                  Altair Radioss(TM) Engine 2023.1                  ** **                                                                    ** **            Non-linear Finite Element Analysis Software             ** **                   from Altair Engineering, Inc.                    ** **                                                                    ** **                                                                    ** **              Linux 64 bits, Intel compiler, Intel MPI              ** **                                                                    ** **                                                                    ** **                                                                    ** ** Build tag: 1192156_1263820231_2070_0101891                         ** ************************************************************************ **  COPYRIGHT (C) 1986-2023                 Altair Engineering, Inc.  ** ** All Rights Reserved.  Copyright notice does not imply publication. ** ** Contains trade secrets of Altair Engineering Inc.                  ** ** Decompilation or disassembly of this software strictly prohibited. ** ************************************************************************    ROOT: BIRD_WINDSHIELD_v1  RESTART: 0001 

    The first part "starter" works as expected and the rst files are also available on the remote node.

    The last part "onestep" stucks for some reason.

    Kind regards, 

    Michael