How is a disk resource created in PBS Pro for use in batch job scheduling

John_22254
John_22254 New Altair Community Member
edited January 2023 in Community Q&A

I wish to have all jobs assigned a default scratch space amount as a resource.

In practice the PBS scheduler would not assign a node unless a minimum available scratch disk space amount is present. 

I have two scratch volume types in the complex:

- nodes sharing a single, large scratch directory (/scratch/pbs_spooler)

- specialized nodes with a local scratch directory (/scratch/pbs_spooler)

I need to introduce "disk" as a resource. I wish to be clear about the procedure for carrying this out.

Following integration, I anticipate "disk" reporting in qstat -Bf something like:

"resources_default.disk =  2147483648kb"     <---- (2 TB example)

This would also provide flexibility to assign a non-default amount to the "disk" resource.

 

Any insight on this procedure will be greatly appreciated.

John

 

This is what I settled on:  An execution host hook periodically updating a host-level custom resource. 

The resource was named dynscratch working similarly to other host-level resources, for example ncpus.  The resource tracks scratch disk usage collected by the periodic hook, the period used is 30 seconds.  Jobs known to consume significant disk space can be queued with this resource to ensure execution node assignments have required disk space.  For example using dynscratch=2000gb, included in the PBS select statement ensures a free disk space of at least, 2000 gigabytes, is available in the /scratch file system on each execution node assigned to the job.   

The PBS Professional Hooks Guide was used to accomplish the task through the direction of Altair support. 

John 

import pbs
import os
import sys
import math

def get_filesystem_avail_unprivileged( dirname ):
  o = os.statvfs(dirname)
  #                            block size x blocks available for unprivileged users
  size_gb = int((o.f_bsize * o.f_bavail) / 1073741824)
  return pbs.size( "%sgb" % size_gb)

def get_filesystem_avail_privileged( dirname ):
  o = os.statvfs(dirname)
  #                            block size x free blocks "privileged" users
  size_gb = int((o.f_bsize * o.f_bavail) / 1073741824)
  return pbs.size( "%sgb" % size_gb)

try:
  dyn_res = {}
  dyn_res["scratch"] = [get_filesystem_avail_unprivileged, "/scratch"]
  
  vnl = pbs.event().vnode_list
  local_node = pbs.get_local_nodename()
  
  for k in dyn_res.keys():
    vnl[local_node].resources_available[k] = dyn_res[k][0](dyn_res[k][1])

except SystemExit:
  pass

except:
  e = pbs.event()
  e.reject("%s hook failed with %s. Please contact HPC Admin " % \
  (e.hook_name, sys.exc_info()[:2]))