Agile and cloud-ready genome pipelines at the Wellcome Sanger Institute

Rosemary Francis_21150
Rosemary Francis_21150 New Altair Community Member
edited February 2023 in Altair HPCWorks

It’s easy for the complexity of High-Performance Computing to cloud the real purpose of the workload. Across HPC, but especially in life sciences, accelerating the time to science really saves lives. At Altair, our tools and processes are not just for improving the return on investment. Our products aim to give you more time to focus on the workload results by spending less time worrying about the job infrastructure.  

Understanding how application access data is key to removing bottlenecks and making the best use of expensive compute resources. Altair Breeze™ is a unique tool for profiling applications and understanding how they use shared storage. Our customers choose these tools to get the information they require from the applications’ performance for increased optimization, easy software deployment problem identification and solving, and portability and migration, without getting mired in the details. 

Agile and cloud-ready genome pipelines at the Wellcome Sanger Institute 

In Cambridgeshire, England, researchers at the Wellcome Sanger Institute rely on HPC solutions to manage their vital genomic discovery research. One of its most transformative endeavors, the Cancer Genome Project, uses high-throughput genome sequencing to identify somatically acquired mutations with the aim of characterizing cancer genes, mutational processes, and patterns of clonal evolution in human tumors. But these projects generate incredible amounts of data; each cancer sample generates around 250GB of data alone. 

Using Mistral to profile the genomic pipelines to improve efficiency in I/O patterns, and Breeze to profile the containerized workloads running in an Amazon Web Services (AWS), users at the Wellcome Sanger Institute were able to identify and optimize small reads that could harm computation performance and storage. Fixing these small reads allowed the storage to run at peak performance with little impact on others’ jobs. 

By deploying these I/O profiling tools, the Wellcome Sanger Institute saved both time and money during a complex and high-value project. In fact, they could save over 10% in production costs, and the project’s run time was reduced from 32 hours to 18 hours. With projects consuming millions of core hours, even small savings are significant.  

“Improving run time often doesn’t require extensive rewrites. Knowing where to look is key.” 

-Keiran Raine, Cancer Researcher 

Read the full Wellcome Sanger institute customer story and the white paper for details about their challenge, solution and success.