Results showing NaN's when Acusolve run on multiple processors
Hi Everyone,
I am trying to get Acusolve working on my university's HPC, and I have a simple internal flow test case that I am using. When I solve the problem with one processor, multiple threads, it gives the expected results, but when I run with multiple processors -- even on the same node of the HPC -- the results show up as NaN values. The solver did not give any error during execution (none that I can see in the *.Log file).
Results from a run with 1 processor ^^^^^
Results from run with multiple processors (MPI) ^^^^^^
Do any of you know why this could be happening?
I am loading the results into Altair SimLab to view them. Could this just be an issue with the way SimLab is reading the results?
Thanks,
Josh
Answers
-
If you can - please attach the .Log files for the two cases.
Which version are you using - both SimLab and CFD Solvers (AcuSolve) ?
0 -
acupro_21778 said:
If you can - please attach the .Log files for the two cases.
Which version are you using - both SimLab and CFD Solvers (AcuSolve) ?
Hi acupro,
Thanks for the reply. Here are the attached Logs. I am using SimLab version 2021.1, and acusolve version 2021.2.
Thanks,
Josh
0 -
Josh Fontana said:
Hi acupro,
Thanks for the reply. Here are the attached Logs. I am using SimLab version 2021.1, and acusolve version 2021.2.
Thanks,
Josh
Are you able to install SimLab 2021.2 - to be consistent with the solver version 2021.2?
Are both SimLab and the solver on the same Linux machine?
Can you post screenshots of the solver launch panel and your settings for each case?
The statistics at the end of the runs are approximately the same - so likely it's some issue with SimLab post. That's the reason for trying SimLab 2021.2 - and just opening the results again.
0 -
acupro_21778 said:
Are you able to install SimLab 2021.2 - to be consistent with the solver version 2021.2?
Are both SimLab and the solver on the same Linux machine?
Can you post screenshots of the solver launch panel and your settings for each case?
The statistics at the end of the runs are approximately the same - so likely it's some issue with SimLab post. That's the reason for trying SimLab 2021.2 - and just opening the results again.
Hi acupro,
The solver is running on our Linux HPC cluster, but SimLab is running on a Windows 10 workstation. I created the solver input file in SimLab 2021.1 on the Windows workstation, transferred the necessary files over to the computing cluster, solved the problem using the "acuRun" script (version 2021.2) on the cluster, and then transferred the results (the entire run directory, including the "ACUSIM.DIR" directory) back to the Windows workstation for viewing, importing the *.Log file into SimLab to view the results.
Are these the images you are looking for?
This last one I re-created after-the-fact (since I couldn't find a way to show this info for an existing solution):
In the "Format and Execute Options" pannel, I noticed that the "number of processors" is set to 4. I'm not sure if this must be the same number that I run with on the HPC cluster, but coincidentally I did run on 4 processors there as well.
If you don't see anything amiss here, I will see what I can do about the version mismatch.
Thanks,
Josh
0 -
Josh Fontana said:
Hi acupro,
The solver is running on our Linux HPC cluster, but SimLab is running on a Windows 10 workstation. I created the solver input file in SimLab 2021.1 on the Windows workstation, transferred the necessary files over to the computing cluster, solved the problem using the "acuRun" script (version 2021.2) on the cluster, and then transferred the results (the entire run directory, including the "ACUSIM.DIR" directory) back to the Windows workstation for viewing, importing the *.Log file into SimLab to view the results.
Are these the images you are looking for?
This last one I re-created after-the-fact (since I couldn't find a way to show this info for an existing solution):
In the "Format and Execute Options" pannel, I noticed that the "number of processors" is set to 4. I'm not sure if this must be the same number that I run with on the HPC cluster, but coincidentally I did run on 4 processors there as well.
If you don't see anything amiss here, I will see what I can do about the version mismatch.
Thanks,
Josh
I think I see the problem. I looked at the Log files again and saw this is how you submitted the run for the single-core solve:
acuRun -fmt ascii -pb cone_flow_solid -nt 4 -hosts compute-11-1That option (-fmt ascii) writes the results in ASCII format rather than the default - making SimLab unable to import the results. I have no idea why that option is even there...
If you want to run again with a single core, use this:
acuRun -pb cone_flow_solid -np 1 -hosts compute-11-1We also typically do not specify the -nt flag, using the default instead, so for a four-core run, I would use:
acuRun -pb cone_flow_solid -np 4 -hosts compute-11-1
By default, that would run a single process, but with 4 threads. If you really want four processes, each with a single thread, use:
acuRun -pb cone_flow_solid -np 4 -nt 1 -hosts compute-11-1
0 -
acupro_21778 said:
I think I see the problem. I looked at the Log files again and saw this is how you submitted the run for the single-core solve:
acuRun -fmt ascii -pb cone_flow_solid -nt 4 -hosts compute-11-1That option (-fmt ascii) writes the results in ASCII format rather than the default - making SimLab unable to import the results. I have no idea why that option is even there...
If you want to run again with a single core, use this:
acuRun -pb cone_flow_solid -np 1 -hosts compute-11-1We also typically do not specify the -nt flag, using the default instead, so for a four-core run, I would use:
acuRun -pb cone_flow_solid -np 4 -hosts compute-11-1
By default, that would run a single process, but with 4 threads. If you really want four processes, each with a single thread, use:
acuRun -pb cone_flow_solid -np 4 -nt 1 -hosts compute-11-1
Actually the "-fmt ascii" case is the one that works here. That is the one that was run as a single process (with multiple threads).
Sorry for the inconsistency between the two cases I presented, but I had tried many cases, and eventually made some output to ascii format so that I could check for "NaN's" in the actual data, or compare the data between the working runs and failing ones, to see what is going on. Every time, (with ascii or not), the cases that used multiple processes (MPI) could not be read into SimLab correctly, and the ones used a single process, could. And I did not find any "NaN's" in the actual data for the failed cases.
For the multi-process cases, I tried them across multiple hosts, at first, forcing it to use multiple processes by default. These cases did not load into SimLab correctly, so then I used "-nt 1 -np 4" to make it run multiple processes on one host, and it still did not load into SimLab correctly, as you see here.
0 -
Josh Fontana said:
Actually the "-fmt ascii" case is the one that works here. That is the one that was run as a single process (with multiple threads).
Sorry for the inconsistency between the two cases I presented, but I had tried many cases, and eventually made some output to ascii format so that I could check for "NaN's" in the actual data, or compare the data between the working runs and failing ones, to see what is going on. Every time, (with ascii or not), the cases that used multiple processes (MPI) could not be read into SimLab correctly, and the ones used a single process, could. And I did not find any "NaN's" in the actual data for the failed cases.
For the multi-process cases, I tried them across multiple hosts, at first, forcing it to use multiple processes by default. These cases did not load into SimLab correctly, so then I used "-nt 1 -np 4" to make it run multiple processes on one host, and it still did not load into SimLab correctly, as you see here.
Sorry - I mis-read the labels on those initial images, for which was which.
Can you zip up the ACUSIM.DIR and .Log file for one of the cases you can't read into SimLab? If it's too big to attach to this post, you can place it in my FTAM link:
0 -
acupro_21778 said:
Sorry - I mis-read the labels on those initial images, for which was which.
Can you zip up the ACUSIM.DIR and .Log file for one of the cases you can't read into SimLab? If it's too big to attach to this post, you can place it in my FTAM link:
Here you go. Thanks for all your help with this.
0 -
Josh Fontana said:
Here you go. Thanks for all your help with this.
This opens fine for me in AcuFieldView 2021.2, HyperWorks CFD 2021.2, and SimLab 2021.2.
This looks to be a version issue - maybe a bug-fix from an earlier version.
0 -
acupro_21778 said:
This opens fine for me in AcuFieldView 2021.2, HyperWorks CFD 2021.2, and SimLab 2021.2.
This looks to be a version issue - maybe a bug-fix from an earlier version.
My university is working on updating to the new version, but I assume this is the issue since you were able to load the results.
Thanks,
Josh
0