🎉Community Raffle - Win $25

An exclusive raffle opportunity for active members like you! Complete your profile, answer questions and get your first accepted badge to enter the raffle.
Join and Win

Scaling with PBS Professional and Dask for Python

User: "Rosemary Francis_21150"
New Altair Community Member
Updated by Rosemary Francis_21150

Python has grown to be the dominant language in programming and data analytics, and Python libraries like NumPy, pandas, and scikit-learn make it easy to reuse code. But because those libraries weren’t designed to scale beyond a single machine, HPC developers often turn to projects like Dask, a library for parallel computing which enables scaling to multi-core systems and distributed clusters, on-premises and in the cloud. 

Dask has a familiar Python API, support for GPU acceleration, and an interactive dashboard for real-time visibility. Dask works well in traditional data-science environments like Hadoop clusters and also on high-performance computers, and it’s used by organizations including Microsoft and NASA. 

Dask integrates with our Altair® PBS Professional® workload manager, using Dask-jobqueue. It requests nodes from PBS Professional to create a “Dask cluster” with the PBS Professional nodes. Dask-jobqueue provides a convenient interface that’s accessible from interactive systems like Jupyter Notebook and batch jobs. You can run PBS Professional jobs using Python commands in Dask, and you can run existing workloads on a common PBS-managed cloud. This reduces the need to procure separate systems for separate workloads and provides chunks of resources from a common pool. 

Administrating a single infrastructure is easier for IT, with only one set of users and policies. Pooling resources means better resource utilization – and with a capable scheduler, you can outdo domain-specific schedulers and leverage organization-wide policies like allocation management and fairshare. 

To see Dask on PBS Professional in action, watch Matthew Rocklin's video as he demonstrates setting up an interactive computing environment using Dask with PBS Professional on NCAR’s “Cheyenne” supercomputer for atmospheric science. You can see PBS jobs being submitted at around the 6:40 mark.  

 


https://www.youtube.com/watch?v=FXsgmwpRExM

Comments

No comments on this post.