How to select a GPU (Graphics Processing Unit) for EDEM

Stephen Cole
Stephen Cole
Altair Employee
edited March 2023 in Altair HPCWorks

There are some key properties of the GPU card specification that we can look at to determine how to select a GPU for Altair® EDEMTM. These are discussed below however to summarise we want a balance of the following properties:

  • FP32 and FP64 Performance (TFLOPS)
  • Bandwidth (GB/s)
  • Clock Speed (MHz) and Number of Cores
  • Memory (GB)

For the above higher values are better, GPU specification can be found from the Nvidia.com or TechPowerup.com/gpu-specs/ sites. In addition, it is essential the card should also correspond to the EDEM System Requirements (NVIDIA GPU required for CUDA solver):

The EDEM system requirements post also includes details on recommended RAM, CPU and Hard Disk space.  RAM and CPU components don’t influence the GPU simulation speed as when running on the GPU solver there isn’t significant use of the CPU or RAM.  Large simulations, which are typically run on GPU, can require a lot of data to be saved so large and fast solid state hard drives are best.

Starting with “Why use a GPU?” the answer to this is in our benchmark data. A single GPU can be more than 100x faster than 32 CPUs:

If you are interested in a turn-key, fully-maintained appliance running in the public cloud or on-premise as a private cloud, consider the Altair® UnlimitedTM High-Performance Computing environment for EDEM. It combines top-end GPU hardware, system administration services and unlimited software licenses under one platform:

imageIf you want to add a GPU to existing hardware or purchase new not everybody has access to the high-end NVIDIA A100 or the latest NVIDIA H100 GPU cards, how do you select a GPU card based on budget and availability?

Firstly, it should be a NVIDIA GPU card, the latest EDEM GPU solver is a CUDA based solver which runs on NVIDIA cards only:

Further than that we need to look at the specification of the card. The two resources I use for this are the nvidia.com and Techpowerup.com sites. Example data for the A100 card below: 

  image

Left https://www.nvidia.com/content/dam/en-zz/Solutions/Data-Center/a100/pdf/nvidia-a100-datasheet-nvidia-us-2188504-web.pdf Right https://www.techpowerup.com/gpu-specs/a100-pcie-40-gb.c3623

‌This is a lot of data, the NVIDIA data sheet gives a good overview and the TechPowerup.com site lists more than 50 individual specifications from the card, so how do you navigate this data to choose a GPU for EDEM?

EDEM GPU runs on double (FP64), mixed and single (FP32) precision modes, for reference the EDEM CPU solver has always been Double Precision.

The number of cores helps with the parallelisation of the calculations, especially for larger numbers of particles. The memory clock speed allows for faster access to data and memory bandwidth determines how fast the GPU can move data from memory to the computational cores.

We need to avoid bottlenecks in the system, you may not get the full benefit of the card if it has massively fast FP32 calculations with a low bandwidth, the processors would be spending a lot of time idling while the data is transferred.

If running a multi-GPU system then data transfer between cards can use NVlink if the card and system supports this.  NVlink isn’t a requirement however and only influences the data transfer time.  Data transfer is typically a low component of the overall computational cost, in this blog I am looking at single card performance only.

The GPU memory size (GB) does not influence the speed however it does impact on the number of particles that can be simulated. If we take the approximation of 1-2 kb per particle, then a 20 GB card will run out of memory around 10-20 million particles. The memory use mentioned here is an approximation as additional memory can also be taken up by including options like field data or custom properties, also introducing a wider particle size distribution or including more spheres/elements in a particle increases the memory use. Memory usage also varies with contacts, geometries, and precision modes.

Looking at the specification even within 1 card type, the NVIDIA A100 in this case, there are variations in these key areas:

image

The Form Factor is the size and shape of the card, it is important that it can physically fit into the machine. Also the cooling and power requirements of the GPU card should be checked with the supplier to ensure the processing machine can support this.  Often GPU cards are connected to the machine via the industry standard PCIe port however high-end GPU’s may need greater bandwidth that PCIe can supply therefore NVIDIA also supplies cards with a SXM connection to allow the cards to reach their maximum potential. We can see that the SXM form factor cards have higher specifications in the table above and below.

If we compare the V100 to the A100 this really does highlight the benefit of the bandwidth. In our benchmarks the A100 is around 1.8x faster than the V100 however it is important to note that this speed up is a measure of all the different improvements working together.

image

Looking at the H100 GPU (released 2022) we are expecting good improvements from the H100 card however I am going to omit this from the analysis of existing cards below as it will skew the data, I will also omit the SMX form factor as PCIe is standard.

image

Looking at some of the GeForce and other selected cards we can see that these cards are focussed on different applications ( especially the GeForce focussed on gaming), with high FP32 speeds and clock speeds with low FP64 speed. It is worth noting that even though it is the lowest spec in this list the RTX 2080 Ti does run EDEM significantly faster than 16 CPU’s, however given it’s 2018 release date and the development rate of new hardware the 2080 Ti is no longer so suitable for the latest, largest simulations runs that the more recently released cards are capable of.

image

I have omitted price in the tables above as this varies with region and demand, it is also something that becomes outdated quickly. The best all round GPU cards for EDEM over the last couple of years have been the A100 and V100, if these are not available to you then you can check if the cards available are supported and compare against the specification listed above to gain an idea of the card will be comparable to the high-end or lower-end GPU’s available.

Looking to get started using EDEM, go here for e-learning, instructor lead training and tutorials:

How long does it take to run an EDEM Simulation?  GPU improves simulation computational performance however you may want to predict simulation run times:

When running on GPU to check GPU load on windows you can go to the Task Manager > Performance and select the GPU.  It’s important to choose ‘Cuda’ from the dropdown box and this lets us know how hard the card is working on the current simulation.  

How do we optimise the material properties for simulation speed?

How do we automate the running of the simulations?