Feko run fails with floating point exception
I am trying to run feko binary as shown below in and HPC cluster setup.
/mnt/share/codes/feko/2022.2/altair/feko/bin/runfeko /mnt/share/benchmarks/feko/generic_sedan_parametric_1000.fek -np $num_cores --machines-file machinefile -d --mpi-options $MPI_OPTIONS
-genv I_MPI_DEBUG=5 -genv I_MPI_PIN=1 -genv FI_PROVIDER=mlx -genv USE_UCX=1 -genv UCX_MAX_RNDV_RAILS=1
Feko caught signal 8 (PID 3052949)
Memory location which caused fault: 0x3f4002e9595
Floating point exception: Unknown exception with subcode=-6
Feko caught signal 8 (PID 3053073)
Memory location which caused fault: 0x3f4002e9611
Floating point exception: Unknown exception with subcode=-6
The following message from the master process (MYID= 0):
ERROR 3977: Internal Feko error. Please notify the Feko support team and provide the error number, preferably together with the Feko input and output files.
feko_parallel(debug): Exiting with return code 2 (0, 2, 0)
RUNFEKO(debug): Forked child process "feko_parallel" with pid = 3052771
ERROR 20011:
Error when executing the program /mnt/share/codes/feko/2022.2/altair/feko/bin/feko_parallel
with the options " 256 /mnt/share/benchmarks/feko/generic_sedan_parametric_1000 --machines-file machinefile -genv"
(error codes: 2 ; 0 [Success])
See above error message of the program for more details!
RUNFEKO(debug): Error while executing feko_parallel
Find more posts tagged with
Hi Krishna,
I'm not an expert in HPC installations, but maybe it's a problem with missing memory. Does the problem also exist for other (small) models?
Best regards,
Torben
Frankly I dont have other small models to try running feko on our machine. I was assuming that this failure what I am observing is due to some misconfiguration or missing any parameter since the same feko binary runs fine on other machines which we have in our HPC cluster.
Hi Krishna,
How many parallel cores do you use? If it is a large MLFMM simulation you may try with less cores to reduce the memory requirement a bit.
(Just to test if this may be memory related)
Best regards,
Torben
I have successfully ran the same on 128 and 192 core machine and now I am trying to run the same on 256 core machine with around 1 giga bytes as L3 cache. Do you suggest that even with having such configuration the run could fail ?
I have successfully ran the same on 128 and 192 core machine and now I am trying to run the same on 256 core machine with around 1 giga bytes as L3 cache. Do you suggest that even with having such configuration the run could fail ?
Well, I have no idea about the model and which solver is used. If MLFMM is used then will probably see an increase of memory with more cores (256 is a lot!!). Why not just try wiht 32 cores to see if it works?
Could you attach the model maybe?
Best regards,
Torben
Hi Krishna,
I'm not an expert in HPC installations, but maybe it's a problem with missing memory. Does the problem also exist for other (small) models?
Best regards,
Torben