GPU acceleration on RTX 3090 GPU

Rutwik Gulakala
Rutwik Gulakala Altair Community Member
edited May 2022 in Community Q&A

Hello all

 I have a question regarding GPU acceleration on Optistruct. 

 

My PC has an 8 core Ryzen 5 5800x processor coupled with 32GB DDR4 Ram and RTX 3090 GPU with 24GB of VRAM. Although the GPU is quite capable and fast for the AI research that I do, it is not so fast or rather the GPU acceleration is not at all visible when I run Optistruct.

 

But, when I run Optistruct on the PCs of my Institute, which are all Dell Precisions with Xeon 6 core processors, 128Gigs of RAM and Quadro 4000 with 2GB VRAM, the GPU acceleration is very significant and the simulation runs faster.

 

Is this because Optistruct isn't compatible with the Geforce RTX series or do you think something else might be the issue. Probably drivers version of CUDA and Graphics.

 

My driver versions are

Display driver - 512.15

CUDA - V11.3

 

Thank you for your time in advance.

 

Regards

Rutwik

Best Answer

  • PaulAltair
    PaulAltair
    Altair Employee
    edited April 2022 Answer ✓

    Only Tesla and Quadro gpu are officially supported I believe, I don't think the RTX have enough double precision capabilities.

    image

    image

    In general, consumer video cards are not 'officially' supported (even for graphics) by HyperWorks

Answers

  • PaulAltair
    PaulAltair
    Altair Employee
    edited April 2022 Answer ✓

    Only Tesla and Quadro gpu are officially supported I believe, I don't think the RTX have enough double precision capabilities.

    image

    image

    In general, consumer video cards are not 'officially' supported (even for graphics) by HyperWorks

  • Rutwik Gulakala
    Rutwik Gulakala Altair Community Member
    edited April 2022

    Only Tesla and Quadro gpu are officially supported I believe, I don't think the RTX have enough double precision capabilities.

    image

    image

    In general, consumer video cards are not 'officially' supported (even for graphics) by HyperWorks

    Hello Paul

     

    Thank you for pointing it out. I am confused about this. Nvidia has 556.0 GFLOPS of double-precision performance but still, optistruct doesn't show any acceleration in spite of showing using 1 Nvidia GPU for acceleration. Nevertheless, I bought a Quadro K4000 off eBay for Optistruct and I still am unable to see any GPU acceleration on my simulation. I ran the same on my friend's PC running on Linux with Quadro K2200 and the GPU acceleration is working. I would like to know if there is any driver compatibility issue here. I really hope someone can help me with this. 

     

    Regards

    Rutwik

  • PaulAltair
    PaulAltair
    Altair Employee
    edited May 2022

    Hello Paul

     

    Thank you for pointing it out. I am confused about this. Nvidia has 556.0 GFLOPS of double-precision performance but still, optistruct doesn't show any acceleration in spite of showing using 1 Nvidia GPU for acceleration. Nevertheless, I bought a Quadro K4000 off eBay for Optistruct and I still am unable to see any GPU acceleration on my simulation. I ran the same on my friend's PC running on Linux with Quadro K2200 and the GPU acceleration is working. I would like to know if there is any driver compatibility issue here. I really hope someone can help me with this. 

     

    Regards

    Rutwik

    What comparisons do you have on performance for gpu on your friend/institute machines? And are you you using the same OS model and version? How are you quantifying the gpu 'working' and speedup on the systems where you see it? I wouldn't expect those systems to show great speedup with GPU either.

    The K4000 despite being pretty old, is rated at 1244 gflops for floating point (so over 2x your 3090), but the recommended cards (GP100,GV100) are up at 10,329 and 14,817 respectively, (the K2200 is at 1439 so comparable to the K4000) 

    as an e.g. My own graphics card is rated at 3031 gflops for double precision floating point, and if I activate gpu on a test run it is actually slower than using DDM with CPU (so I usually use that rather than GPU)

    On my test run DDM 4 Cores takes 32m, SMP 4 cores + GPU takes 49m

  • Rutwik Gulakala
    Rutwik Gulakala Altair Community Member
    edited May 2022

    What comparisons do you have on performance for gpu on your friend/institute machines? And are you you using the same OS model and version? How are you quantifying the gpu 'working' and speedup on the systems where you see it? I wouldn't expect those systems to show great speedup with GPU either.

    The K4000 despite being pretty old, is rated at 1244 gflops for floating point (so over 2x your 3090), but the recommended cards (GP100,GV100) are up at 10,329 and 14,817 respectively, (the K2200 is at 1439 so comparable to the K4000) 

    as an e.g. My own graphics card is rated at 3031 gflops for double precision floating point, and if I activate gpu on a test run it is actually slower than using DDM with CPU (so I usually use that rather than GPU)

    On my test run DDM 4 Cores takes 32m, SMP 4 cores + GPU takes 49m

    Hello Paul 

     

    Thank you for your quick response like always. 

     

    I am very confused about this because I ran the same file on my friend's workstation using the following commands. He has Quadro k2200

    -optskip -core In -nt 8 -gpu

    His configuration from out file is

     

    ************************************************************************
    ** **
    ** **
    ** Altair OptiStruct(TM) 2019 **
    ** **
    ** Advanced Engineering Analysis, Design and **
    ** Optimization Software from Altair Engineering, Inc. **
    ** **
    ** **
    ** Linux 5.11.0-40-generic tandale-Precision-36 **
    ** 8 CPU: Intel(R) Core(TM) i7-9700 CPU @ 3.00GHz **
    ** 31938 MB RAM, 2033 MB swap **
    ** **
    ** Build: 0960178_0833190_Le64RBNW8UH14M:157056-000 2000004000020016 **
    ************************************************************************
    ** COPYRIGHT (C) 1996-2019 Altair Engineering, Inc. **
    ** All Rights Reserved. Copyright notice does not imply publication. **
    ** Contains trade secrets of Altair Engineering, Inc. **
    ** Decompilation or disassembly of this software strictly prohibited. **
    ************************************************************************


    *** OptiStruct defaults set from:
    install config file: /home/tandale/2019/altair/hwsolvers/optistruct.cfg.


    NOTE # 9199
    MSGLMT=STRICT is active, all messages will be printed.
    You can suppress some less important warning messages by use of
    MSGLMT=BRIEF or UNREF (in config file or in the input data).

    *** INFORMATION # 9196
    Using 1 NVIDIA cards for GPU acceleration.

     

    It took 5min 4sec to execute 11 iterations.

     

    This is the result from my PC with the same command. I have Quadro K4000

    -optskip -core In -nt 12 -gpu

     

    ************************************************************************
    ** **
    ** **
    ** Altair OptiStruct(TM) 2019 **
    ** **
    ** Advanced Engineering Analysis, Design and **
    ** Optimization Software from Altair Engineering, Inc. **
    ** **
    ** **
    ** Windows 10 (Build 9200) DESKTOP-LR7O0T6 **
    ** 12 CPU: AMD Ryzen 5 2600 Six-Core Processor **
    ** 25780 MB RAM, 37582 MB swap **
    ** **
    ** Build: 0960191med33190_Ce64RBNW8UH14M:157056-000 2000004000020016 **
    ************************************************************************
    ** COPYRIGHT (C) 1996-2019 Altair Engineering, Inc. **
    ** All Rights Reserved. Copyright notice does not imply publication. **
    ** Contains trade secrets of Altair Engineering, Inc. **
    ** Decompilation or disassembly of this software strictly prohibited. **
    ************************************************************************


    *** OptiStruct defaults set from:
    install config file: C:/Program Files/Altair/2019/hwsolvers/optistruct.cfg.


    NOTE # 9199
    MSGLMT=STRICT is active, all messages will be printed.
    You can suppress some less important warning messages by use of
    MSGLMT=BRIEF or UNREF (in config file or in the input data).

    *** INFORMATION # 9196
    Using 1 NVIDIA cards for GPU acceleration.

     

    It took 11min 29 sec on my PC with Quadro K4000 GPU.

  • PaulAltair
    PaulAltair
    Altair Employee
    edited May 2022

    Hello Paul 

     

    Thank you for your quick response like always. 

     

    I am very confused about this because I ran the same file on my friend's workstation using the following commands. He has Quadro k2200

    -optskip -core In -nt 8 -gpu

    His configuration from out file is

     

    ************************************************************************
    ** **
    ** **
    ** Altair OptiStruct(TM) 2019 **
    ** **
    ** Advanced Engineering Analysis, Design and **
    ** Optimization Software from Altair Engineering, Inc. **
    ** **
    ** **
    ** Linux 5.11.0-40-generic tandale-Precision-36 **
    ** 8 CPU: Intel(R) Core(TM) i7-9700 CPU @ 3.00GHz **
    ** 31938 MB RAM, 2033 MB swap **
    ** **
    ** Build: 0960178_0833190_Le64RBNW8UH14M:157056-000 2000004000020016 **
    ************************************************************************
    ** COPYRIGHT (C) 1996-2019 Altair Engineering, Inc. **
    ** All Rights Reserved. Copyright notice does not imply publication. **
    ** Contains trade secrets of Altair Engineering, Inc. **
    ** Decompilation or disassembly of this software strictly prohibited. **
    ************************************************************************


    *** OptiStruct defaults set from:
    install config file: /home/tandale/2019/altair/hwsolvers/optistruct.cfg.


    NOTE # 9199
    MSGLMT=STRICT is active, all messages will be printed.
    You can suppress some less important warning messages by use of
    MSGLMT=BRIEF or UNREF (in config file or in the input data).

    *** INFORMATION # 9196
    Using 1 NVIDIA cards for GPU acceleration.

     

    It took 5min 4sec to execute 11 iterations.

     

    This is the result from my PC with the same command. I have Quadro K4000

    -optskip -core In -nt 12 -gpu

     

    ************************************************************************
    ** **
    ** **
    ** Altair OptiStruct(TM) 2019 **
    ** **
    ** Advanced Engineering Analysis, Design and **
    ** Optimization Software from Altair Engineering, Inc. **
    ** **
    ** **
    ** Windows 10 (Build 9200) DESKTOP-LR7O0T6 **
    ** 12 CPU: AMD Ryzen 5 2600 Six-Core Processor **
    ** 25780 MB RAM, 37582 MB swap **
    ** **
    ** Build: 0960191med33190_Ce64RBNW8UH14M:157056-000 2000004000020016 **
    ************************************************************************
    ** COPYRIGHT (C) 1996-2019 Altair Engineering, Inc. **
    ** All Rights Reserved. Copyright notice does not imply publication. **
    ** Contains trade secrets of Altair Engineering, Inc. **
    ** Decompilation or disassembly of this software strictly prohibited. **
    ************************************************************************


    *** OptiStruct defaults set from:
    install config file: C:/Program Files/Altair/2019/hwsolvers/optistruct.cfg.


    NOTE # 9199
    MSGLMT=STRICT is active, all messages will be printed.
    You can suppress some less important warning messages by use of
    MSGLMT=BRIEF or UNREF (in config file or in the input data).

    *** INFORMATION # 9196
    Using 1 NVIDIA cards for GPU acceleration.

     

    It took 11min 29 sec on my PC with Quadro K4000 GPU.

    Ok, from what you show there, the first thing I note is that you are running '-nt 12' but your Ryzen only has 6 physical cores?, in general, hyperthreaded virtual cores won't help too much (may even hinder) with solver performance, you may well have better run performance (or at least no worse) with '-nt 6'

    On your friend's Intel (which has 8 cores, so -nt 8 is ok) what is the run performance if they run without the '-gpu' option? i.e. is the -gpu option actually speeding up the run?

    The Floating Point Performance of the Intel i7-9700 is over 50% better than your Ryzen according to passmark so an 'expected' run time on your machine like for like would be around 8 minutes I think (running -nt 6, assuming the GPU isn't helping in either case)

    If you can try -nt 8 on your friends' machine and -nt 6 on yours, with no -gpu on either, see what you get then in terms of solution time. If you can share the out files then maybe we can get more information too.

    Finally, Is the analysis you are running a Linear static solution? or Eigenmode? or NonLinear? in the 2019 version that you are using, only linear static and modal were supported for gpu I think (documentation still states this for later versions, but Mumps is supported in more recent versions). 2019 is 4 versions old now since the release of 2022.

  • Rutwik Gulakala
    Rutwik Gulakala Altair Community Member
    edited May 2022

    Ok, from what you show there, the first thing I note is that you are running '-nt 12' but your Ryzen only has 6 physical cores?, in general, hyperthreaded virtual cores won't help too much (may even hinder) with solver performance, you may well have better run performance (or at least no worse) with '-nt 6'

    On your friend's Intel (which has 8 cores, so -nt 8 is ok) what is the run performance if they run without the '-gpu' option? i.e. is the -gpu option actually speeding up the run?

    The Floating Point Performance of the Intel i7-9700 is over 50% better than your Ryzen according to passmark so an 'expected' run time on your machine like for like would be around 8 minutes I think (running -nt 6, assuming the GPU isn't helping in either case)

    If you can try -nt 8 on your friends' machine and -nt 6 on yours, with no -gpu on either, see what you get then in terms of solution time. If you can share the out files then maybe we can get more information too.

    Finally, Is the analysis you are running a Linear static solution? or Eigenmode? or NonLinear? in the 2019 version that you are using, only linear static and modal were supported for gpu I think (documentation still states this for later versions, but Mumps is supported in more recent versions). 2019 is 4 versions old now since the release of 2022.

    Hello Paul

     

    Thank you again for your reply. I feel so dumb man. I forgot that 9700 has 8 physical cores and mine has only 6. Yes, you might be completely correct. I will do what you have suggested and see how fast the simulation runs.

     

    Regarding the analysis, I am doing a linear static simulation. I am really sorry for wasting your time and I really appreciate your patience in addressing and pointing out my mistake. Thank you so much once again. will reply with my findings.

     

    Regards

    Rutwik

  • PaulAltair
    PaulAltair
    Altair Employee
    edited May 2022

    Hello Paul

     

    Thank you again for your reply. I feel so dumb man. I forgot that 9700 has 8 physical cores and mine has only 6. Yes, you might be completely correct. I will do what you have suggested and see how fast the simulation runs.

     

    Regarding the analysis, I am doing a linear static simulation. I am really sorry for wasting your time and I really appreciate your patience in addressing and pointing out my mistake. Thank you so much once again. will reply with my findings.

     

    Regards

    Rutwik

    Hey no, it is fine, finding the best combination of cores/gpu, LDM/DDM/SMP, RAMDISK etc for a job can be tricky sometimes. My gpu experience is limited, as I said, I found with my hardware it didn't help much and I just didn't go back to it after that. The Linux vs Windows could be a factor too for sure, I think I recall some other solvers show better gpu behaviour under linux so that could be the case here too (I don't have a linux machine to test gpu on unfortunately) Good luck anyway.