Feko run fails with floating point exception

User: "Krishna_21416"
Altair Community Member
Updated by Krishna_21416

I am trying to run feko binary as shown below in and HPC cluster setup.

/mnt/share/codes/feko/2022.2/altair/feko/bin/runfeko /mnt/share/benchmarks/feko/generic_sedan_parametric_1000.fek -np $num_cores --machines-file machinefile -d --mpi-options $MPI_OPTIONS
 
where Mpi Options is set to -genv I_MPI_DEBUG=5 -genv I_MPI_PIN=1 -genv FI_PROVIDER=mlx -genv USE_UCX=1 -genv UCX_MAX_RNDV_RAILS=1
 
While this runs fine on few clusters on one specific cluster the run fails with below error.
 
 Feko caught signal 8 (PID 3052949)
  Memory location which caused fault: 0x3f4002e9595
 Floating point exception: Unknown exception with subcode=-6
 Feko caught signal 8 (PID 3053073)
  Memory location which caused fault: 0x3f4002e9611
 Floating point exception: Unknown exception with subcode=-6
 The following message from the master process (MYID= 0):
 ERROR    3977: Internal Feko error. Please notify the Feko support team and provide the error number, preferably together with the Feko input and output files.
 
and 
 
feko_parallel(debug): Exiting with return code 2 (0, 2, 0)
RUNFEKO(debug): Forked child process "feko_parallel" with pid = 3052771

ERROR  20011:

  Error when executing the program /mnt/share/codes/feko/2022.2/altair/feko/bin/feko_parallel
  with the options " 256 /mnt/share/benchmarks/feko/generic_sedan_parametric_1000 --machines-file machinefile -genv"
  (error codes: 2 ; 0 [Success])
  See above error message of the program for more details!
RUNFEKO(debug): Error while executing feko_parallel
 
What could be possible reasons for this failure and what would be required to fix the same.

Find more posts tagged with