GPU acceleration on RTX 3090 GPU

Rutwik Gulakala · April 2022

Hello all

I have a question regarding GPU acceleration on Optistruct.

My PC has an 8 core Ryzen 5 5800x processor coupled with 32GB DDR4 Ram and RTX 3090 GPU with 24GB of VRAM. Although the GPU is quite capable and fast for the AI research that I do, it is not so fast or rather the GPU acceleration is not at all visible when I run Optistruct.

But, when I run Optistruct on the PCs of my Institute, which are all Dell Precisions with Xeon 6 core processors, 128Gigs of RAM and Quadro 4000 with 2GB VRAM, the GPU acceleration is very significant and the simulation runs faster.

Is this because Optistruct isn't compatible with the Geforce RTX series or do you think something else might be the issue. Probably drivers version of CUDA and Graphics.

My driver versions are

Display driver - 512.15

CUDA - V11.3

Thank you for your time in advance.

Regards

Rutwik

PaulAltair · April 2022

Only Tesla and Quadro gpu are officially supported I believe, I don't think the RTX have enough double precision capabilities.

In general, consumer video cards are not 'officially' supported (even for graphics) by HyperWorks

PaulAltair · April 2022

Only Tesla and Quadro gpu are officially supported I believe, I don't think the RTX have enough double precision capabilities.

In general, consumer video cards are not 'officially' supported (even for graphics) by HyperWorks

Rutwik Gulakala · April 2022

Paul Sharp_21301 said:
Only Tesla and Quadro gpu are officially supported I believe, I don't think the RTX have enough double precision capabilities.

In general, consumer video cards are not 'officially' supported (even for graphics) by HyperWorks

Hello Paul

Thank you for pointing it out. I am confused about this. Nvidia has 556.0 GFLOPS of double-precision performance but still, optistruct doesn't show any acceleration in spite of showing using 1 Nvidia GPU for acceleration. Nevertheless, I bought a Quadro K4000 off eBay for Optistruct and I still am unable to see any GPU acceleration on my simulation. I ran the same on my friend's PC running on Linux with Quadro K2200 and the GPU acceleration is working. I would like to know if there is any driver compatibility issue here. I really hope someone can help me with this.

Regards

Rutwik

PaulAltair · May 2022

Rutwik Gulakala said:
Hello Paul

Thank you for pointing it out. I am confused about this. Nvidia has 556.0 GFLOPS of double-precision performance but still, optistruct doesn't show any acceleration in spite of showing using 1 Nvidia GPU for acceleration. Nevertheless, I bought a Quadro K4000 off eBay for Optistruct and I still am unable to see any GPU acceleration on my simulation. I ran the same on my friend's PC running on Linux with Quadro K2200 and the GPU acceleration is working. I would like to know if there is any driver compatibility issue here. I really hope someone can help me with this.

Regards

Rutwik

What comparisons do you have on performance for gpu on your friend/institute machines? And are you you using the same OS model and version? How are you quantifying the gpu 'working' and speedup on the systems where you see it? I wouldn't expect those systems to show great speedup with GPU either.

The K4000 despite being pretty old, is rated at 1244 gflops for floating point (so over 2x your 3090), but the recommended cards (GP100,GV100) are up at 10,329 and 14,817 respectively, (the K2200 is at 1439 so comparable to the K4000)

as an e.g. My own graphics card is rated at 3031 gflops for double precision floating point, and if I activate gpu on a test run it is actually slower than using DDM with CPU (so I usually use that rather than GPU)

On my test run DDM 4 Cores takes 32m, SMP 4 cores + GPU takes 49m

Rutwik Gulakala · May 2022

Paul Sharp_21301 said:
What comparisons do you have on performance for gpu on your friend/institute machines? And are you you using the same OS model and version? How are you quantifying the gpu 'working' and speedup on the systems where you see it? I wouldn't expect those systems to show great speedup with GPU either.

The K4000 despite being pretty old, is rated at 1244 gflops for floating point (so over 2x your 3090), but the recommended cards (GP100,GV100) are up at 10,329 and 14,817 respectively, (the K2200 is at 1439 so comparable to the K4000)

as an e.g. My own graphics card is rated at 3031 gflops for double precision floating point, and if I activate gpu on a test run it is actually slower than using DDM with CPU (so I usually use that rather than GPU)

On my test run DDM 4 Cores takes 32m, SMP 4 cores + GPU takes 49m

Hello Paul

Thank you for your quick response like always.

I am very confused about this because I ran the same file on my friend's workstation using the following commands. He has Quadro k2200

-optskip -core In -nt 8 -gpu

His configuration from out file is

************************************************************************
** **
** **
** Altair OptiStruct(TM) 2019 **
** **
** Advanced Engineering Analysis, Design and **
** Optimization Software from Altair Engineering, Inc. **
** **
** **
** Linux 5.11.0-40-generic tandale-Precision-36 **
** 8 CPU: Intel(R) Core(TM) i7-9700 CPU @ 3.00GHz **
** 31938 MB RAM, 2033 MB swap **
** **
** Build: 0960178_0833190_Le64RBNW8UH14M:157056-000 2000004000020016 **
************************************************************************
** COPYRIGHT (C) 1996-2019 Altair Engineering, Inc. **
** All Rights Reserved. Copyright notice does not imply publication. **
** Contains trade secrets of Altair Engineering, Inc. **
** Decompilation or disassembly of this software strictly prohibited. **
************************************************************************

*** OptiStruct defaults set from:
install config file: /home/tandale/2019/altair/hwsolvers/optistruct.cfg.

NOTE # 9199
MSGLMT=STRICT is active, all messages will be printed.
You can suppress some less important warning messages by use of
MSGLMT=BRIEF or UNREF (in config file or in the input data).

*** INFORMATION # 9196
Using 1 NVIDIA cards for GPU acceleration.

It took 5min 4sec to execute 11 iterations.

This is the result from my PC with the same command. I have Quadro K4000

-optskip -core In -nt 12 -gpu

************************************************************************
** **
** **
** Altair OptiStruct(TM) 2019 **
** **
** Advanced Engineering Analysis, Design and **
** Optimization Software from Altair Engineering, Inc. **
** **
** **
** Windows 10 (Build 9200) DESKTOP-LR7O0T6 **
** 12 CPU: AMD Ryzen 5 2600 Six-Core Processor **
** 25780 MB RAM, 37582 MB swap **
** **
** Build: 0960191med33190_Ce64RBNW8UH14M:157056-000 2000004000020016 **
************************************************************************
** COPYRIGHT (C) 1996-2019 Altair Engineering, Inc. **
** All Rights Reserved. Copyright notice does not imply publication. **
** Contains trade secrets of Altair Engineering, Inc. **
** Decompilation or disassembly of this software strictly prohibited. **
************************************************************************

*** OptiStruct defaults set from:
install config file: C:/Program Files/Altair/2019/hwsolvers/optistruct.cfg.

NOTE # 9199
MSGLMT=STRICT is active, all messages will be printed.
You can suppress some less important warning messages by use of
MSGLMT=BRIEF or UNREF (in config file or in the input data).

*** INFORMATION # 9196
Using 1 NVIDIA cards for GPU acceleration.

It took 11min 29 sec on my PC with Quadro K4000 GPU.

PaulAltair · May 2022

Rutwik Gulakala said:
Hello Paul

Thank you for your quick response like always.

I am very confused about this because I ran the same file on my friend's workstation using the following commands. He has Quadro k2200

-optskip -core In -nt 8 -gpu

His configuration from out file is

************************************************************************
** **
** **
** Altair OptiStruct(TM) 2019 **
** **
** Advanced Engineering Analysis, Design and **
** Optimization Software from Altair Engineering, Inc. **
** **
** **
** Linux 5.11.0-40-generic tandale-Precision-36 **
** 8 CPU: Intel(R) Core(TM) i7-9700 CPU @ 3.00GHz **
** 31938 MB RAM, 2033 MB swap **
** **
** Build: 0960178_0833190_Le64RBNW8UH14M:157056-000 2000004000020016 **
************************************************************************
** COPYRIGHT (C) 1996-2019 Altair Engineering, Inc. **
** All Rights Reserved. Copyright notice does not imply publication. **
** Contains trade secrets of Altair Engineering, Inc. **
** Decompilation or disassembly of this software strictly prohibited. **
************************************************************************

*** OptiStruct defaults set from:
install config file: /home/tandale/2019/altair/hwsolvers/optistruct.cfg.

NOTE # 9199
MSGLMT=STRICT is active, all messages will be printed.
You can suppress some less important warning messages by use of
MSGLMT=BRIEF or UNREF (in config file or in the input data).

*** INFORMATION # 9196
Using 1 NVIDIA cards for GPU acceleration.

It took 5min 4sec to execute 11 iterations.

This is the result from my PC with the same command. I have Quadro K4000

-optskip -core In -nt 12 -gpu

************************************************************************
** **
** **
** Altair OptiStruct(TM) 2019 **
** **
** Advanced Engineering Analysis, Design and **
** Optimization Software from Altair Engineering, Inc. **
** **
** **
** Windows 10 (Build 9200) DESKTOP-LR7O0T6 **
** 12 CPU: AMD Ryzen 5 2600 Six-Core Processor **
** 25780 MB RAM, 37582 MB swap **
** **
** Build: 0960191med33190_Ce64RBNW8UH14M:157056-000 2000004000020016 **
************************************************************************
** COPYRIGHT (C) 1996-2019 Altair Engineering, Inc. **
** All Rights Reserved. Copyright notice does not imply publication. **
** Contains trade secrets of Altair Engineering, Inc. **
** Decompilation or disassembly of this software strictly prohibited. **
************************************************************************

*** OptiStruct defaults set from:
install config file: C:/Program Files/Altair/2019/hwsolvers/optistruct.cfg.

NOTE # 9199
MSGLMT=STRICT is active, all messages will be printed.
You can suppress some less important warning messages by use of
MSGLMT=BRIEF or UNREF (in config file or in the input data).

*** INFORMATION # 9196
Using 1 NVIDIA cards for GPU acceleration.

It took 11min 29 sec on my PC with Quadro K4000 GPU.

Ok, from what you show there, the first thing I note is that you are running '-nt 12' but your Ryzen only has 6 physical cores?, in general, hyperthreaded virtual cores won't help too much (may even hinder) with solver performance, you may well have better run performance (or at least no worse) with '-nt 6'

On your friend's Intel (which has 8 cores, so -nt 8 is ok) what is the run performance if they run without the '-gpu' option? i.e. is the -gpu option actually speeding up the run?

The Floating Point Performance of the Intel i7-9700 is over 50% better than your Ryzen according to passmark so an 'expected' run time on your machine like for like would be around 8 minutes I think (running -nt 6, assuming the GPU isn't helping in either case)

If you can try -nt 8 on your friends' machine and -nt 6 on yours, with no -gpu on either, see what you get then in terms of solution time. If you can share the out files then maybe we can get more information too.

Finally, Is the analysis you are running a Linear static solution? or Eigenmode? or NonLinear? in the 2019 version that you are using, only linear static and modal were supported for gpu I think (documentation still states this for later versions, but Mumps is supported in more recent versions). 2019 is 4 versions old now since the release of 2022.

Rutwik Gulakala · May 2022

Paul Sharp_21301 said:
Ok, from what you show there, the first thing I note is that you are running '-nt 12' but your Ryzen only has 6 physical cores?, in general, hyperthreaded virtual cores won't help too much (may even hinder) with solver performance, you may well have better run performance (or at least no worse) with '-nt 6'

On your friend's Intel (which has 8 cores, so -nt 8 is ok) what is the run performance if they run without the '-gpu' option? i.e. is the -gpu option actually speeding up the run?

The Floating Point Performance of the Intel i7-9700 is over 50% better than your Ryzen according to passmark so an 'expected' run time on your machine like for like would be around 8 minutes I think (running -nt 6, assuming the GPU isn't helping in either case)

If you can try -nt 8 on your friends' machine and -nt 6 on yours, with no -gpu on either, see what you get then in terms of solution time. If you can share the out files then maybe we can get more information too.

Finally, Is the analysis you are running a Linear static solution? or Eigenmode? or NonLinear? in the 2019 version that you are using, only linear static and modal were supported for gpu I think (documentation still states this for later versions, but Mumps is supported in more recent versions). 2019 is 4 versions old now since the release of 2022.

Hello Paul

Thank you again for your reply. I feel so dumb man. I forgot that 9700 has 8 physical cores and mine has only 6. Yes, you might be completely correct. I will do what you have suggested and see how fast the simulation runs.

Regarding the analysis, I am doing a linear static simulation. I am really sorry for wasting your time and I really appreciate your patience in addressing and pointing out my mistake. Thank you so much once again. will reply with my findings.

Regards

Rutwik

PaulAltair · May 2022

Rutwik Gulakala said:
Hello Paul

Thank you again for your reply. I feel so dumb man. I forgot that 9700 has 8 physical cores and mine has only 6. Yes, you might be completely correct. I will do what you have suggested and see how fast the simulation runs.

Regarding the analysis, I am doing a linear static simulation. I am really sorry for wasting your time and I really appreciate your patience in addressing and pointing out my mistake. Thank you so much once again. will reply with my findings.

Regards

Rutwik

Hey no, it is fine, finding the best combination of cores/gpu, LDM/DDM/SMP, RAMDISK etc for a job can be tricky sometimes. My gpu experience is limited, as I said, I found with my hardware it didn't help much and I just didn't go back to it after that. The Linux vs Windows could be a factor too for sure, I think I recall some other solvers show better gpu behaviour under linux so that could be the case here too (I don't have a linux machine to test gpu on unfortunately) Good luck anyway.

GPU acceleration on RTX 3090 GPU

Welcome!

Best Answer

Answers

Welcome!

Welcome!

Quick Links

Categories