Results showing NaN's when Acusolve run on multiple processors

Josh Fontana
Josh Fontana Altair Community Member
edited February 2022 in Community Q&A

Hi Everyone,

I am trying to get Acusolve working on my university's HPC, and I have a simple internal flow test case that I am using. When I solve the problem with one processor, multiple threads, it gives the expected results, but when I run with multiple processors -- even on the same node of the HPC -- the results show up as NaN values. The solver did not give any error during execution (none that I can see in the *.Log file).

image

Results from a run with 1 processor ^^^^^

image

Results from run with multiple processors (MPI) ^^^^^^

 

Do any of you know why this could be happening?

I am loading the results into Altair SimLab to view them. Could this just be an issue with the way SimLab is reading the results?

Thanks,

Josh

Tagged:

Answers

  • acupro
    acupro
    Altair Employee
    edited January 2022

    If you can - please attach the .Log files for the two cases.

    Which version are you using - both SimLab and CFD Solvers (AcuSolve) ?

  • Josh Fontana
    Josh Fontana Altair Community Member
    edited February 2022

    If you can - please attach the .Log files for the two cases.

    Which version are you using - both SimLab and CFD Solvers (AcuSolve) ?

    Hi acupro,

    Thanks for the reply. Here are the attached Logs. I am using SimLab version 2021.1, and acusolve version 2021.2.

     

    Thanks,

    Josh

  • acupro
    acupro
    Altair Employee
    edited February 2022

    Hi acupro,

    Thanks for the reply. Here are the attached Logs. I am using SimLab version 2021.1, and acusolve version 2021.2.

     

    Thanks,

    Josh

    Are you able to install SimLab 2021.2 - to be consistent with the solver version 2021.2?

    Are both SimLab and the solver on the same Linux machine?

    Can you post screenshots of the solver launch panel and your settings for each case?

    The statistics at the end of the runs are approximately the same - so likely it's some issue with SimLab post.  That's the reason for trying SimLab 2021.2 - and just opening the results again.

  • Josh Fontana
    Josh Fontana Altair Community Member
    edited February 2022

    Are you able to install SimLab 2021.2 - to be consistent with the solver version 2021.2?

    Are both SimLab and the solver on the same Linux machine?

    Can you post screenshots of the solver launch panel and your settings for each case?

    The statistics at the end of the runs are approximately the same - so likely it's some issue with SimLab post.  That's the reason for trying SimLab 2021.2 - and just opening the results again.

    Hi acupro,

    The solver is running on our Linux HPC cluster, but SimLab is running on a Windows 10 workstation. I created the solver input file in SimLab 2021.1 on the Windows workstation, transferred the necessary files over to the computing cluster, solved the problem using the "acuRun" script (version 2021.2) on the cluster, and then transferred  the results (the entire run directory, including the "ACUSIM.DIR" directory) back to the Windows workstation for viewing, importing the *.Log file into SimLab to view the results.

    Are these the images you are looking for?

    image

    image

    This last one I re-created after-the-fact (since I couldn't find a way to show this info for an existing solution):

    image

     

    In the "Format and Execute Options" pannel, I noticed that the "number of processors" is set to 4. I'm not sure if this must be the same number that I run with on the HPC cluster, but coincidentally I did run on 4 processors there as well.

    If you don't see anything amiss here, I will see what I can do about the version mismatch.

     

    Thanks,

    Josh

  • acupro
    acupro
    Altair Employee
    edited February 2022

    Hi acupro,

    The solver is running on our Linux HPC cluster, but SimLab is running on a Windows 10 workstation. I created the solver input file in SimLab 2021.1 on the Windows workstation, transferred the necessary files over to the computing cluster, solved the problem using the "acuRun" script (version 2021.2) on the cluster, and then transferred  the results (the entire run directory, including the "ACUSIM.DIR" directory) back to the Windows workstation for viewing, importing the *.Log file into SimLab to view the results.

    Are these the images you are looking for?

    image

    image

    This last one I re-created after-the-fact (since I couldn't find a way to show this info for an existing solution):

    image

     

    In the "Format and Execute Options" pannel, I noticed that the "number of processors" is set to 4. I'm not sure if this must be the same number that I run with on the HPC cluster, but coincidentally I did run on 4 processors there as well.

    If you don't see anything amiss here, I will see what I can do about the version mismatch.

     

    Thanks,

    Josh

    I think I see the problem.  I looked at the Log files again and saw this is how you submitted the run for the single-core solve:

    acuRun -fmt ascii -pb cone_flow_solid -nt 4 -hosts compute-11-1

    That option (-fmt ascii) writes the results in ASCII format rather than the default - making SimLab unable to import the results.  I have no idea why that option is even there...

    If you want to run again with a single core, use this:

    acuRun  -pb cone_flow_solid -np 1 -hosts compute-11-1

    We also typically do not specify the -nt flag, using the default instead, so for a four-core run, I would use:

    acuRun -pb cone_flow_solid -np 4 -hosts compute-11-1

    By default, that would run a single process, but with 4 threads.  If you really want four processes, each with a single thread, use:

    acuRun -pb cone_flow_solid -np 4 -nt 1 -hosts compute-11-1

  • Josh Fontana
    Josh Fontana Altair Community Member
    edited February 2022

    I think I see the problem.  I looked at the Log files again and saw this is how you submitted the run for the single-core solve:

    acuRun -fmt ascii -pb cone_flow_solid -nt 4 -hosts compute-11-1

    That option (-fmt ascii) writes the results in ASCII format rather than the default - making SimLab unable to import the results.  I have no idea why that option is even there...

    If you want to run again with a single core, use this:

    acuRun  -pb cone_flow_solid -np 1 -hosts compute-11-1

    We also typically do not specify the -nt flag, using the default instead, so for a four-core run, I would use:

    acuRun -pb cone_flow_solid -np 4 -hosts compute-11-1

    By default, that would run a single process, but with 4 threads.  If you really want four processes, each with a single thread, use:

    acuRun -pb cone_flow_solid -np 4 -nt 1 -hosts compute-11-1

    Actually the "-fmt ascii" case is the one that works here. That is the one that was run as a single process (with multiple threads).

    Sorry for the inconsistency between the two cases I presented, but I had tried many cases, and eventually made some output to ascii format so that I could check for "NaN's" in the actual data, or compare the data between the working runs and failing ones, to see what is going on. Every time, (with ascii or not), the cases that used multiple processes (MPI) could not be read into SimLab correctly, and the ones used a single process, could. And I did not find any "NaN's" in the actual data for the failed cases.

     

    For the multi-process cases, I tried them across multiple hosts, at first, forcing it to use multiple processes by default. These cases did not load into SimLab correctly, so then I used "-nt 1 -np 4" to make it run multiple processes on one host, and it still did not load into SimLab correctly, as you see here.

  • acupro
    acupro
    Altair Employee
    edited February 2022

    Actually the "-fmt ascii" case is the one that works here. That is the one that was run as a single process (with multiple threads).

    Sorry for the inconsistency between the two cases I presented, but I had tried many cases, and eventually made some output to ascii format so that I could check for "NaN's" in the actual data, or compare the data between the working runs and failing ones, to see what is going on. Every time, (with ascii or not), the cases that used multiple processes (MPI) could not be read into SimLab correctly, and the ones used a single process, could. And I did not find any "NaN's" in the actual data for the failed cases.

     

    For the multi-process cases, I tried them across multiple hosts, at first, forcing it to use multiple processes by default. These cases did not load into SimLab correctly, so then I used "-nt 1 -np 4" to make it run multiple processes on one host, and it still did not load into SimLab correctly, as you see here.

    Sorry - I mis-read the labels on those initial images, for which was which.

    Can you zip up the ACUSIM.DIR and .Log file for one of the cases you can't read into SimLab?  If it's too big to attach to this post, you can place it in my FTAM link:

    https://ftam1.altair.com/filedrop/~Pq5FMe

  • Josh Fontana
    Josh Fontana Altair Community Member
    edited February 2022

    Sorry - I mis-read the labels on those initial images, for which was which.

    Can you zip up the ACUSIM.DIR and .Log file for one of the cases you can't read into SimLab?  If it's too big to attach to this post, you can place it in my FTAM link:

    https://ftam1.altair.com/filedrop/~Pq5FMe

    Here you go. Thanks for all your help with this.

  • acupro
    acupro
    Altair Employee
    edited February 2022

    Here you go. Thanks for all your help with this.

    This opens fine for me in AcuFieldView 2021.2, HyperWorks CFD 2021.2, and SimLab 2021.2.

    image

     

    This looks to be a version issue - maybe a bug-fix from an earlier version.

  • Josh Fontana
    Josh Fontana Altair Community Member
    edited February 2022

    This opens fine for me in AcuFieldView 2021.2, HyperWorks CFD 2021.2, and SimLab 2021.2.

    image

     

    This looks to be a version issue - maybe a bug-fix from an earlier version.

    My university is working on updating to the new version, but I assume this is the issue since you were able to load the results.

    Thanks,

    Josh