Exit Code 9

Altair Forum User
Altair Forum User
Altair Employee
edited October 2020 in Community Q&A

Hello,

 

I ran a simulation on HPC cluster where I used 28 slots to get solution for my problem. Our HPC system uses Sun Grid Engine. With 28 slots, HPC system allows maximum usage of 512GB/28  18GB per process. I can enable Hyper-Threading(Multi-Threading) on cluster, that can double the number of slots to 56. With 56 slots, peak memory usage is 512GB/56  9GB per process. I tried using 56 slots, and job exited with EXIT CODE 9.

 

  1. What does EXIT CODE 9 mean in FEKO's terminology?
  2. How can EXIT CODE 9 be avoided?

 

Thanks,

FieldForcer

Tagged:

Answers

  • JIF
    JIF
    Altair Employee
    edited June 2018

    Hello FieldForcer,

     

    Exit code 9 seems to be a memory overflow, but if you supply the actual stdout, I can possibly give you more suggestions w.r.t. avoiding the problem. If you supply the model, I can try to reproduce it and make recommendations. Also, what version are you using, 2018? Please remember to supply as much info as possible if you want to get a useful reply on the forum.

     

    It is not recommended to use Hyper-Threading with FEKO - for best performance, keep it disabled.

  • Altair Forum User
    Altair Forum User
    Altair Employee
    edited November 2020

    Hello JIF,

     

    I have FEKO 2017 version. When I tried to check for new updates, I get following message:

    image.png.2fea691ba71b60b0f9b0e26e9c99d0cc.png

    image.png.be3743e3886065a97295b01530df875b.png

    Regardless, I checked new features in FEKO 2018; new features are not related to my current problem. There seems to be another feature related to solving problem with FEM+MOM method that could be helpful in including a FEM current source with rest of the model, if possible.

     

    I have attached log file, and *.cfx file. I look forward to your suggestions.

     

    Sometimes, my job exits with code 255. What is EXIT CODE 255?  How is EXIT CODE 9 different from EXIT CODE 255

     

    Thanks,

    FieldForcer

     

     

    Unable to find an attachment - read this blog

  • JIF
    JIF
    Altair Employee
    edited June 2018

    Everything in your output is indicating that you are running out of memory and all the exit codes and errors are due to that. I had a look at your model and you should do the following:

    • Don't mesh the sphere as fine as you have done. Using 1mm for a 90mm sphere is good enough and results in about 800 elements. I'm not sure why you want to mesh at 0.1mm.
    • Since you are working a 60 Hz and with a small model, standard solutions will lead to numerical problems. You don't have any dielectrics, so you can turn on low frequency stabilisation.
    • Also turn on double precision (due to the low frequency and relatively small size of the model).
    • Include the resistor that you created, but seemed to have excluded.

    Other items that you can consider:

    • I see the wires intersect and thus I suspect that your wire radius is too big.
    • I changed the port to a vertex port instead of a segment port, but that is not required and won't make a difference at 60Hz
    • I would recommend that you union all the wires, then you can request currents on all the wires and it will ensure correct connectivity (although, for your model and wire connections, connectivity is not a problem).

    The model then solved with 1 core on my laptop and produced the following fields:

    <?xml version="1.0" encoding="UTF-8"?>pic.png

  • Altair Forum User
    Altair Forum User
    Altair Employee
    edited June 2018

    Hello JIF,

     

    I appreciate your response. Thank you. Please see my inline reply. 

    ******************************************************************

    Everything in your output is indicating that you are running out of memory and all the exit codes and errors are due to that. I had a look at your model and you should do the following:

    • Don't mesh the sphere as fine as you have done. Using 1mm for a 90mm sphere is good enough and results in about 800 elements. I'm not sure why you want to mesh at 0.1mm.

    >> Previously, I had executed a simulation with 1mm mesh of sphere. Unless I change mesh size, I can't study accuracy of solution. I would like to know when mesh refinement is not needed. I do wish to understand behavior of solution as mesh of sphere is changed, and whether further refinement is warranted. If possible, and due to nature of work involved, I would like to run a simulation with 0.1mm mesh for sphere. 

    • Since you are working a 60 Hz and with a small model, standard solutions will lead to numerical problems. You don't have any dielectrics, so you can turn on low frequency stabilisation.

    >> After executing sphere composed of free space dielectric, I will change dielectric constant. This constitues next part of my study. 

    • Also turn on double precision (due to the low frequency and relatively small size of the model).
    • Include the resistor that you created, but seemed to have excluded.

    >> I excluded resistor because I specified large port impedance. 

    image.png.5acbc9ef49a6c0c2b96161b6d05d3c30.png

    Do I still add another resistor to simulate current source? Please suggest.

      

    Other items that you can consider:

    • I see the wires intersect and thus I suspect that your wire radius is too big.

    >> Thanks for pointing that out. My wire radius is 1mm. To avoid intersection, I can bump down # of turns in coil to 10. 

    • I changed the port to a vertex port instead of a segment port, but that is not required and won't make a difference at 60Hz
    • I would recommend that you union all the wires, then you can request currents on all the wires and it will ensure correct connectivity (although, for your model and wire connections, connectivity is not a problem).

    ******************************************************************

    Thanks again for all your inputs.

    FieldForcer

  • JIF
    JIF
    Altair Employee
    edited June 2018

    Hello FieldForcer,

     

    >> Previously, I had executed a simulation with 1mm mesh of sphere. Unless I change mesh size, I can't study accuracy of solution. I would like to know when mesh refinement is not needed. I do wish to understand behavior of solution as mesh of sphere is changed, and whether further refinement is warranted. If possible, and due to nature of work involved, I would like to run a simulation with 0.1mm mesh for sphere.  

    Doing a mesh convergence study is a good idea. I simply noticed that you were using 20 (and more) cores to solve a problem that solves in a short time on my laptop using a single core. You can mesh it at 0.1 mm if you want to see the differences (and I don't think there would be much of a difference). But, I would suggest to start with 1mm and see if there are changes when you go to 0.5 mm (halving the size should result in roughly 4 time more triangles and remember the MoM scales O(N^2) with memory and O(N^3) with run time). Each time you halve the element size, the required memory and run time goes up. Don't start with a super fine mesh - you want to use the least amount of elements that provide an accurate answer. So, if you really want to run it at 0.1 mm, fine, but then you have to be happy with requiring 100x more memory and simulations that take a 1000x longer (for no good reason in my opinion).

     

    >> After executing sphere composed of free space dielectric, I will change dielectric constant. This constitues next part of my study. 

    FEKO does have some dielectric solution methods that are stable (more stable) at lower frequencies, like the VEP (volume equivalence principle). I think low frequency stabalisation works with VEP, but that will have to be tested. It definitely does not work with SEP, but I suspect it might still be supported with lossy metals (not sure).

     

    To be honest, although FEKO has features to allow it solve at very low frequencies, it really is not designed to work at 60 Hz (not for the type of problem that you seem to be interested in). I would rather suggest that you use Flux (also part of HyperWorks).

     

    >> I excluded resistor because I specified large port impedance.  

    The port impedance does not add any resistance, you still need the load. That is a port reference impedance - we have an open issue to rename the text on the dialog.

     

    All the best with your investigation. ;-)

  • Altair Forum User
    Altair Forum User
    Altair Employee
    edited July 2018

    Hello JIF,

     

    Doing a mesh convergence study is a good idea. I simply noticed that you were using 20 (and more) cores to solve a problem that solves in a short time on my laptop using a single core. You can mesh it at 0.1 mm if you want to see the differences (and I don't think there would be much of a difference). But, I would suggest to start with 1mm and see if there are changes when you go to 0.5 mm (halving the size should result in roughly 4 time more triangles and remember the MoM scales O(N^2) with memory and O(N^3) with run time). Each time you halve the element size, the required memory and run time goes up. Don't start with a super fine mesh - you want to use the least amount of elements that provide an accurate answer. So, if you really want to run it at 0.1 mm, fine, but then you have to be happy with requiring 100x more memory and simulations that take a 1000x longer (for no good reason in my opinion).

    1. Simulation did not complete with 0.1mm mesh on 20 nodes. It would have taken longer to execute simulation on laptop. I executed another simulation with coarser mesh. Lack of memory is definately a problem on single laptop, and it scales with 0.1mm mesh. While studying mesh convergence, that same model shows peak E-field of 90mV/m in your result and 7500kV/m in my result. There is an order of magnitude difference. This difference can't originate from same model executed on different computing machines. <?xml version="1.0" encoding="UTF-8"?>image.png
    2. The problem on coarse mesh, with 1mm element length, took ~16 hours to solve on 20 nodes. image.png.e470bbb32539ee6e98020004ca3b632e.png 
    3. (for no good reason in my opinion).

      This is a standard computational problem of the kind where element size is 103-6 times smaller than overall model size. I am sure there are methods in FEM to solve such problems. Is there a recipe in FEKO to still get a solution?   

    4. Programmers in FEKO may have already automated mesh convergence as Error Estimation. Then perhaps I can specify accuracy of solution as a goal in FEKO, say 1% accurate solution. In first run I can come to know a good mesh size, then I can keep that same mesh to run future simulations.  

     

    The port impedance does not add any resistance, you still need the load. That is a port reference impedance - we have an open issue to rename the text on the dialog.

    1. From what it looks like, impedance at a port is kind of a resistance. As a result, addition of load should add to current load. I shall run simulation without port impedance, and add load. It would be interesting to see difference in results.

     

    Thanks for the best wishes,

    FieldForcer