Configuring GPU for PhysicsAI model training

Garima_Singh
Garima_Singh
Altair Employee

Configuring GPU for PhysicsAI model training

Hello PhysicsAI users,

As you are already aware that the GPU can be leveraged for model training in PhysicsAI, however, there are a few important points which should be considered to make sure that the GPU gets utilized by PhysicsAI for the same.

First consideration :  required software architecture for supporting GPU usage

  1. Checking CUDA/CUDNN libraries compatibility with PhysicsAI : Make sure that the CUDA Toolkit 11.8 & CUDNN 8.7 version are installed on the machine.
  2. Kindly refer to the ‘Appendix’ section to check for the download links on the same.

Note: No other versions of CUDA Toolkit are supported as of now due to dependencies of the other HyperWorks products on the CUDA Toolkit 11.8 & CUDNN 8.7 version.

To cross-check, if the Window’s machine has the required CUDA libraries installed, follow the below mentioned steps:

  • Checking CUDA Toolkit version: Open Windows Command Line/PowerShell & write the command nvcc --version to get the CUDA version information (as shown in Figure 1).

Note: If one is performing a new installation of CUDA Toolkit 11.8 from the link mentioned in point 2 above, NVIDIA might prompt to install the latest CUDA Version. Kindly install it as it is & verify the version (as shown in Figure 1).

image

Figure 1 :  Checking CUDA Toolkit version using nvcc --version in the Windows PowerShell

Second consideration :  required hardware architecture for supporting GPU usage

  1. Checking GPU compatibility with PhysicsAI: Make sure that the GPU being considered has the correct specifications in terms of the ‘Compute Capability’.
  2. PhysicsAI supports oldest Compute Capability as 6.0.
  3. To do so, in HyperMesh under the Python console (View>Python Window), write the below commands to check the GPU specifications & ‘Compute Capability’ (as shown in Figure 2).

from tensorflow.python.client import device_lib

print(device_lib.list_local_devices())

image

Figure 2 : GPU Specifications & Compute Capability using the Python Window in HyperMesh

FAQ’s related to using GPU in PhysicsAI:

  • Checking GPU usage for model training:
  1. Traditionally, GPU usage (CUDA graphs) can be checked using Windows Task Manager. However, there have been instances of GPU usage (CUDA graphs) not being shown in the Windows Task Manager & hence, the above-mentioned suggestions can serve useful to check if GPU/CPU is being used for the PhysicsAI model training.
  2. Once GPU has the Compute Capability of greater than 6.0, CUDA Toolkit 11.8 & CUDNN 8.7 version are installed, then PhysicsAI would automatically use the GPU for the model training & the log file under ‘PhysicsAI>Model training>Show Log’  will have the GPU memory usage displayed (as shown in Figure 3) at the end of the training log.

image

Figure 3 : GPU usage by PhysicsAI for the model training indicated by the ‘Model Training Log’

  • Disabling GPU usage for PhysicsAI model training

Once GPU is detected by PhysicsAI, it will automatically be used for the model training. If in certain scenarios, CPU usage is required instead, set the environmental variable as shown below:

CUDA_VISIBLE_DEVICES=-1

This would serve useful in scenarios where the GPU does not have enough memory to train the model due to various reasons related to the number of elements/nodes in the model, model hyper-parameters such as width/depth/epochs & gives the PhysicsAI error as ‘Resources exhausted during training - Try reducing width, depth or size of the mesh’.

  • Memory consumption by GPU for the PhysicsAI model training:
  1. During model training, one training batch is moved from the CPU to GPU at a time. For example: if a batch size of 1 is used, only one model (mesh) resides on the GPU at a time & hence, the peak memory usage is governed by the largest model (mesh) in the training data & not on the number of training meshes in total used for the model training.
  2. To increase the GPU peak memory usage, the batch size can be increased (as shown in Figure 4) & each epoch would be faster. However, increasing the model size can affect the model accuracy & also might require adjustment of the learning rate when batch size has been increased.
  3. PhysicsAI 2024.0 version supports mini batch size feature & can be fine-tuned as per the requirement.

image

Figure 4 : Batch size configuration in PhysicsAI (based on Altair internal content)

Other relevant blogs on PhysicsAI:

To understand other details related to PhysicsAI, kindly refer the blog (13 Frequently Asked Questions About Altair physicsAI) using the below link: https://community.altair.com/community?id=community_blog&sys_id=5e76d67f1b7c7510c4dfdbd9dc4bcba6

Appendix: