Feko run fails with floating point exception

Altair Community Member

Apr 26, 2024

Updated Apr 29, 2024 by Krishna_21416

I am trying to run feko binary as shown below in and HPC cluster setup.

/mnt/share/codes/feko/2022.2/altair/feko/bin/runfeko /mnt/share/benchmarks/feko/generic_sedan_parametric_1000.fek -np $num_cores --machines-file machinefile -d --mpi-options $MPI_OPTIONS

where Mpi Options is set to -genv I_MPI_DEBUG=5 -genv I_MPI_PIN=1 -genv FI_PROVIDER=mlx -genv USE_UCX=1 -genv UCX_MAX_RNDV_RAILS=1

While this runs fine on few clusters on one specific cluster the run fails with below error.

 Feko caught signal 8 (PID 3052949)

  Memory location which caused fault: 0x3f4002e9595

 Floating point exception: Unknown exception with subcode=-6

 Feko caught signal 8 (PID 3053073)

  Memory location which caused fault: 0x3f4002e9611

 Floating point exception: Unknown exception with subcode=-6

 The following message from the master process (MYID= 0):

&nbsp;ERROR &nbsp; &nbsp;3977: Internal Feko error. Please notify the Feko support team and provide the error number, preferably together with the Feko input and output files.

and

feko_parallel(debug): Exiting with return code 2 (0, 2, 0)

RUNFEKO(debug): Forked child process "feko_parallel" with pid = 3052771

ERROR  20011:

  Error when executing the program /mnt/share/codes/feko/2022.2/altair/feko/bin/feko_parallel

&nbsp; with the options " 256 /mnt/share/benchmarks/feko/generic_sedan_parametric_1000 --machines-file machinefile -genv"

  (error codes: 2 ; 0 [Success])

  See above error message of the program for more details!

RUNFEKO(debug): Error while executing feko_parallel

What could be possible reasons for this failure and what would be required to fix the same.

Find more posts tagged with

English

Feko

Sort by:

1 - 5 of 51

Torben Voigt

Altair Employee

Apr 29, 2024

Updated Apr 29, 2024 by Torben Voigt

Hi Krishna,

I'm not an expert in HPC installations, but maybe it's a problem with missing memory. Does the problem also exist for other (small) models?

Best regards,
Torben

Krishna_21416

Altair Community Member

Apr 29, 2024

Updated Apr 29, 2024 by Krishna_21416

Hi Krishna,

I'm not an expert in HPC installations, but maybe it's a problem with missing memory. Does the problem also exist for other (small) models?

Best regards,
Torben

Frankly I dont have other small models to try running feko on our machine. I was assuming that this failure what I am observing is due to some misconfiguration or missing any parameter since the same feko binary runs fine on other machines which we have in our HPC cluster.

Torben Voigt

Altair Employee

Apr 29, 2024

Updated Apr 29, 2024 by Torben Voigt

Hi Krishna,

How many parallel cores do you use? If it is a large MLFMM simulation you may try with less cores to reduce the memory requirement a bit.

(Just to test if this may be memory related)

Best regards,
Torben

Krishna_21416

Altair Community Member

Apr 29, 2024

Updated Apr 29, 2024 by Krishna_21416

Hi Krishna,

How many parallel cores do you use? If it is a large MLFMM simulation you may try with less cores to reduce the memory requirement a bit.

(Just to test if this may be memory related)

Best regards,
Torben

I have successfully ran the same on 128 and 192 core machine and now I am trying to run the same on 256 core machine with around 1 giga bytes as L3 cache. Do you suggest that even with having such configuration the run could fail ?

Torben Voigt

Altair Employee

Apr 29, 2024

Updated Apr 29, 2024 by Torben Voigt

I have successfully ran the same on 128 and 192 core machine and now I am trying to run the same on 256 core machine with around 1 giga bytes as L3 cache. Do you suggest that even with having such configuration the run could fail ?

Well, I have no idea about the model and which solver is used. If MLFMM is used then will probably see an increase of memory with more cores (256 is a lot!!). Why not just try wiht 32 cores to see if it works?

Could you attach the model maybe?

Best regards,
Torben

🎉Community Raffle - Win $25

Feko run fails with floating point exception

Find more posts tagged with

Quick Links