Maximising return on investment on GPUs with Altair Grid Engine

Rosemary Francis_21150
Rosemary Francis_21150 New Altair Community Member
edited November 2022 in Altair HPCWorks

Our customers choose Grid Engine because it delivers high system utilization without being complex to administer. Users get their results more quickly, IT admins don’t have to play nanny to queues and slowdowns, and the organization's budget managers are happy because high utilization and fast time to results represents the best return on investment. 

Altair Grid Engine was designed for distributed resource management and optimization across thousands of data centers — and it’s popular with users because so much works right out of the box, meaning higher utilization and less system administration overhead. The installation and configuration process is highly automated and guided, which means that it is very hard to make a mistake when setting up Grid Engine. The architecture of Grid Engine scales easily to millions of jobs while also supporting very wide workloads or more exotic AI workloads that run in containers. With very high utilization and reliable scalability Altair Grid Engine makes sure that you get the most from your expensive compute estate.  

Features we love include: 

  • Support for GPUs, containers, and MPI technology  
  • Cloud bursting automation  
  • High system reliability and world-class support 

Workload Management for Machine Learning at ISI 

In Southern California, the VISTA lab team at the Viterbi School of Engineering’s Information Sciences Institute uses machine learning extensively in areas like facial identification and handwriting recognition, sharing a cluster with another research group. They were initially distributing jobs manually, and their attempt at using an open-source scheduler ran into trouble with expensive GPU resources and turned into a time drain that took away from research. They needed a more sophisticated solution. 

ISI chose Altair Grid Engine for its built-in advanced GPU support, detailed documentation, and ongoing product upgrades. Altair’s customer support was another factor. Now they’re running smoothly and training artificial neural networks to advance the state of research for facial recognition. 

“With Altair Grid Engine, we have an infrastructure that schedules workloads to GPUs. We operate our infrastructure at 95% capacity with lower overall costs.”  

— Stephen Rawls, Research Analyst 

Read the ISI customer story for details about their success, including a deep-learning training experiment that processed over 3 million images without fail.