Eliminating noisy neighbours and bad I/O at Diamond Light Source
In high-performance computing (HPC), no two environments are the same due to the wide range of applications run at each centre. Workload managers like Altair® Grid Engine® have sophisticated options to ensure that compute resources are shared effectively that are flexible enough to cope with a wide range of different needs from different sites. Altair® Grid Engine®, for example, manages datacenters around the world with optimization, efficiency, and performance baked into its fabric.
However, even then most advanced workload managers cannot control access to storage so that is where Altair Mistral™ comes in. Mistral is a system monitoring tool with detailed metrics tracking how applications access shared storage. Mistral offers live system telemetry and I/O monitoring for system administrators looking to eradicate poor I/O bottlenecks.
Application monitoring and solving at Diamond Light Source
Diamond Light Source, in the United Kingdom, uses radiation beams from high-speed electrons in a range of experiments arranged around the edges of the accelerator. The machine accelerates electrons to near light speeds so that they give off light 10 billion times brighter than the sun. Each ray of light, or “beamline” is used in a variety of experiments, from jet engines tests to Covid-19 variance predictions. Each experiment requires a multitude of applications, both in-house and not, and produces huge datasets. The HPC team in turn must be able to store this data and to access it quickly.
With Altair Grid Engine managing the workloads generated by these experiments, Mistral was used to assess data access efficiency with the goal to improve performance, or changes to the way third-party applications are used.
Key results include:
- identifying an application creating unnecessary metadata resulting in marked decrease in runtime which led to an improvement in metadata loads on the overall system;
- spotting the long pythonpaths of certain application setups, which forced the system to trawl multiple locations before correctly finding and launching from the last configured point. The team was able to amend the pythonpath to greatly improve the launch-time of these applications;
- and, catching “noisy neighbours” in the act. Mistral logged job IDs as well as hostnames to identify the programs that oversaturated the filesystem.
“By using Mistral, our team has already made a marked improvement to various applications that we maintain in-house. We have been able to reduce the impact of noisy neighbours, reduce runtime and identify applications with bad I/O. We intend to keep using Mistral to profile more applications and improve the overall architecture of our systems for in-house and third-party tools.”
—Fredrik Ferner, senior computer systems administrator, Diamond Light Source
Read the in-depth coverage of Diamond Light Sources success with Altair Mistral here.