Runtime error when using GPU

Raïsa Roeplal
Raïsa Roeplal Altair Community Member
edited December 2023 in Community Q&A

Dear all,

I am running a computationally expensive simulation with 1 GPU and run into a runtime error problem (see screenshot).

Any ideas on why this is happening?

Thanks in advance for any replies.

Cheers,

Raisa

image

Answers

  • Stephen Cole
    Stephen Cole
    Altair Employee
    edited June 2023

    Hi Raisa,


    Are you using the CUDA GPU solver and could the GPU be running out of memory?  

    In the Simulator you can select the 'show GPU memory usage' option to check usage.

    image

    Regards

    Stephen

  • Raïsa Roeplal
    Raïsa Roeplal Altair Community Member
    edited July 2023

    Hi Raisa,


    Are you using the CUDA GPU solver and could the GPU be running out of memory?  

    In the Simulator you can select the 'show GPU memory usage' option to check usage.

    image

    Regards

    Stephen

    Hi Stephen,

    Indeed, this problem occurs when I have too many particles for the GPU to handle.

    I am now trying to use two GPUs for the simulation, but I get the same problem. One a single GPU, the simulation ran until approximately 73% and with two GPUs the simulation runs until approximately 83%. I expected the simulation to finish with two GPUs as I have doubled the memory. Why does this not work, and what would be the way to complete the simulation?

    Thanks in advace!

  • Stephen Cole
    Stephen Cole
    Altair Employee
    edited July 2023

    Hi Stephen,

    Indeed, this problem occurs when I have too many particles for the GPU to handle.

    I am now trying to use two GPUs for the simulation, but I get the same problem. One a single GPU, the simulation ran until approximately 73% and with two GPUs the simulation runs until approximately 83%. I expected the simulation to finish with two GPUs as I have doubled the memory. Why does this not work, and what would be the way to complete the simulation?

    Thanks in advace!

    Hi Raisa,


    It maybe the way the GPU is balancing the load, it's not always able to get a 50:50 split so sounds like in this case there is more load on 1 GPU.  

    Just to confirm if you are using the latest 2022.3 version? That may help with the load on the GPU.

     

    Also you could try running mixed or single precision modes EDEM CUDA GPU - Precision Modes

     

    Regards

    Stephen

  • Raïsa Roeplal
    Raïsa Roeplal Altair Community Member
    edited December 2023

    Hi Stephen,

    I am now using EDEM 2023 on a high performance cluster. I am trying to see how much I can speed up my simulations by using multiple GPUs. I did one simulation using a single GPU and then repeated that simulation with 2 GPUs. So far I am not noticing any significant speed up (runtime with one GPU was 9.8 hrs and with 2 GPUs it was 9.7 hrs). I am wondering if there is a way to track how the load is balanced between the 2 GPUs?

    Thanks in advance!

  • Stephen Cole
    Stephen Cole
    Altair Employee
    edited December 2023

    Hi Stephen,

    I am now using EDEM 2023 on a high performance cluster. I am trying to see how much I can speed up my simulations by using multiple GPUs. I did one simulation using a single GPU and then repeated that simulation with 2 GPUs. So far I am not noticing any significant speed up (runtime with one GPU was 9.8 hrs and with 2 GPUs it was 9.7 hrs). I am wondering if there is a way to track how the load is balanced between the 2 GPUs?

    Thanks in advance!

    Hi Raisa,

    You can use windows task manager > Performance to check GPU load, just make sure that the Cuda option is chosen from the drop-down:

    image

    Running EDEM simulations with the flag --debug-logger c:\location\filename.txt gives information about where the computational time is spent, it is a bit difficult to read but can give insight.

    Is there a high file save rate in the simulations?  That typically restricts the GPU speed, or custom API models can slow down the expected improvements.  Couplings with other products can limit the speed benefit on GPU as can low numbers of particles.


    Regards

    Stephen