Hello,
I'm trying to get Radioss 2023.1 to work for us, but I get the following error message by submitting a Test tutorial via Access 2021.1.2 / pbspro 2021.1.4. (RHEL 7.9)
start.py: INFO: Running command /cm/shared/apps/altair/2023.1/altair/scripts/radioss BIRD_WINDSHIELD_v1_0001.rad -onestep -mpi i -hostfile /cm/local/apps/pbspro/var/spool/aux/51563.serverxy -np 24 -nt 1 -v 2023.1 WARNING in hwsolver script execution: Default environment variables for MPI run are provided because none were set: setenv KMP_AFFINITY scatter setenv I_MPI_PIN_DOMAIN auto setenv I_MPI_ADJUST_BCAST 1 setenv I_MPI_ADJUST_REDUCE 2 This setup is often adequate, however use of hardware specific values may allow to tune for better performance. ************************************* [mpiexec@cn001] Launch arguments: /cm/shared/apps/altair/2023.1/altair/mpi/linux64/intel-mpi/bin//hydra_bstrap_proxy --upstream-host cn001.cm.cluster --upstream-port 36138 --pgid 0 --launcher ssh --launcher-number 0 --launcher-exec /cm/shared/apps/pbspro/current/bin/pbs_tmrsh --base-path /cm/shared/apps/altair/2023.1/altair/mpi/linux64/intel-mpi/bin/ --tree-width 16 --tree-level 1 --time-left -1 --launch-type 2 --debug --proxy-id 0 --node-id 0 --subtree-size 1 --upstream-fd 9 /cm/shared/apps/altair/2023.1/altair/mpi/linux64/intel-mpi/bin//hydra_pmi_proxy --usize -1 --auto-cleanup 1 --abort-signal 9 [mpiexec@cn001] Launch arguments: /cm/shared/apps/pbspro/current/bin/pbs_tmrsh -q -x cn005.cm.cluster /cm/shared/apps/altair/2023.1/altair/mpi/linux64/intel-mpi/bin//hydra_bstrap_proxy --upstream-host cn001.cm.cluster --upstream-port 36138 --pgid 0 --launcher ssh --launcher-number 0 --launcher-exec /cm/shared/apps/pbspro/current/bin/pbs_tmrsh --base-path /cm/shared/apps/altair/2023.1/altair/mpi/linux64/intel-mpi/bin/ --tree-width 16 --tree-level 1 --time-left -1 --launch-type 2 --debug --proxy-id 1 --node-id 1 --subtree-size 1 /cm/shared/apps/altair/2023.1/altair/mpi/linux64/intel-mpi/bin//hydra_pmi_proxy --usize -1 --auto-cleanup 1 --abort-signal 9 SPMD RD solver run ({/cm/shared/apps/altair/2023.1/altair/mpi/linux64/intel-mpi/bin/mpirun} -bootstrap ssh -envall -machinefile /cm/local/apps/pbspro/var/spool/aux/51563.serverxy -n 24 /cm/shared/apps/altair/2023.1/altair/hwsolvers/radioss/bin/linux64/e_2023.1_linux64_impi -i {BIRD_WINDSHIELD_v1_0001.rad} -nt 1) crashed StdErr output from the solver: ============================== [mpiexec@cn001] check_exit_codes (../../../../../src/pm/i_hydra/libhydra/demux/hydra_demux_poll.c:117): unable to run bstrap_proxy on cn005.cm.cluster (pid 110057, exit code 65280) [mpiexec@cn001] poll_for_event (../../../../../src/pm/i_hydra/libhydra/demux/hydra_demux_poll.c:159): check exit codes error [mpiexec@cn001] HYD_dmx_poll_wait_for_proxy_event (../../../../../src/pm/i_hydra/libhydra/demux/hydra_demux_poll.c:212): poll for event error [mpiexec@cn001] HYD_bstrap_setup (../../../../../src/pm/i_hydra/libhydra/bstrap/src/intel/i_hydra_bstrap.c:1065): error waiting for event [mpiexec@cn001] HYD_print_bstrap_setup_error_message (../../../../../src/pm/i_hydra/mpiexec/intel/i_mpiexec.c:1026): error setting up the bootstrap proxies [mpiexec@cn001] Possible reasons: [mpiexec@cn001] 1. Host is unavailable. Please check that all hosts are available. [mpiexec@cn001] 2. Cannot launch hydra_bstrap_proxy or it crashed on one of the hosts. Make sure hydra_bstrap_proxy is available on all hosts and it has right permissions. [mpiexec@cn001] 3. Firewall refused connection. Check that enough ports are allowed in the firewall and specify them with the I_MPI_PORT_RANGE variable. [mpiexec@cn001] 4. Ssh bootstrap cannot launch processes on remote host. Make sure that passwordless ssh connection is established across compute hosts. [mpiexec@cn001] You may try using -bootstrap option to select alternative launcher.
If I try to launch (by commandline) starter file ; deploy rst /incl files multiple nodes ; run engine file everything is fine.
If I try to launch the above mpirun command -bootstrap ssh (by commandline) it also works.
I also tried to use the following parameters.
os.environ['I_MPI_DEBUG'] = '6' os.environ['I_MPI_HYDRA_IFACE'] = 'ib0' os.environ['I_MPI_HYDRA_BOOTSTRP_EXEC_EXTRA_ARGS'] = '-disable-x' os.environ['I_MPI_COLL_EXTERNAL'] = '0' os.environ['I_MPI_ADJUST_GATHERV'] = '3' os.environ['I_MPI_PORT_RANGE'] = '20000:30000'
ssh passwordless is configured. Previous versions of Radioss like 2022.2 is working fine with both parameters.
os.environ['I_MPI_HYDRA_BOOTSTRAP'] = 'rsh' os.environ['I_MPI_HYDRA_BOOTSTRAP_EXEC'] = '/cm/shared/apps/pbspro/current/bin/pbs_tmrsh'
the command ./mpirun -host cn002.cm.cluster cpuinfo command from the installation path of Hyperworks 2023.1 works.
Do you have similar experience?
Kind regards,
Michael