Submitted job to a newly created "interactive" queue, ia_cfd_q.
When tested the scheduler reports "Job will never run with resources currently configured in the complex" referring to this issue:
11/01/2021 10:53:22;0400;pbs_sched;Job;1129.cnetr-batch;Chunk: 1:ncpus=4:mem=10gb:Qlist=ia_cfd_q
11/01/2021 10:53:22;0400;pbs_sched;Job;1129.cnetr-batch;Found 0 out of 1 chunks needed
11/01/2021 10:53:22;0040;pbs_sched;Job;1129.cnetr-batch;Insufficient amount of resource: Qlist (ia_cfd_q != cfd_q,np_cfd_q)
I assigned ia_cfd_q to Qlist on 15 nodes (40 ncpus x ~ 180 gb) alone and with a list using "ia_cfd_q,cfd_q,np_cfd_q"
I can move this job from ia_cfd_q to, another queue, cfd_q, and it runs fine.
cfd nodes are all the same hardware in the same cluster and share the same job_dir lustre file system
Any idea what might be happening?
REF
pbs_version = 2020.1.3.20210315160738
Job submitted with directives
#!/bin/sh
#PBS -N tester
#PBS -A CUSTOM_SCRIPT
#PBS -q ia_cfd_q
#PBS -l walltime=1:00:00
#PBS -l select=1:ncpus=4:mem=10gb
#PBS -l place=exclhost
#PBS -W sandbox=PRIVATE
#PBS -m ae
#PBS -r n
#PBS -o pbs_out
#PBS -e pbs_err
qstat -sw
Req'd Req'd Elap
Job ID Username Queue Jobname SessID NDS TSK Memory Time S Time
------------------------------ --------------- --------------- --------------- -------- ---- ----- ------ ----- - -----
1129.cnetr-batch boydwil ia_cfd_q tester -- 1 4 10gb 01:00 Q --
Can Never Run: Insufficient amount of resource: Qlist (ia_cfd_q != cfd_q,np_cfd_q)
Scheduler log
11/01/2021 10:53:22;0400;pbs_sched;Sched;create_resresv_sets;Number of job equivalence classes: 4
11/01/2021 10:53:22;0400;pbs_sched;Node;create_node_buckets;Created node bucket ncpus=40:mem=192780mb:Qlist=cfd_q,np_cfd_q:accelerator=False
11/01/2021 10:53:22;0400;pbs_sched;Node;create_node_buckets;Created node bucket ncpus=40:mem=176652mb:Qlist=cfd_q,np_cfd_q:accelerator=False
11/01/2021 10:53:22;0400;pbs_sched;Node;create_node_buckets;Created node bucket ncpus=16:mem=773392mb:Qlist=fea_q,np_fea_q:accelerator=False
11/01/2021 10:53:22;0400;pbs_sched;Node;create_node_buckets;Created node bucket ncpus=16:mem=773392mb:Qlist=fea_shared_q:accelerator=False
11/01/2021 10:53:22;0400;pbs_sched;Node;create_node_buckets;Created node bucket ncpus=16:mem=773392mb:Qlist=fea_q,np_fea_q,ia_fat_q:accelerator=False
11/01/2021 10:53:22;0400;pbs_sched;Node;create_node_buckets;Created node bucket ncpus=40:mem=192780mb:accelerator=False
11/01/2021 10:53:22;0080;pbs_sched;Job;1129.cnetr-batch;Considering job to run
11/01/2021 10:53:22;0800;pbs_sched;Job;check_queue_max_user_run;1129.cnetr-batch user boydwil max_*user_run (-2, 1), used 0
11/01/2021 10:53:22;0800;pbs_sched;Job;check_max_user_res;1129.cnetr-batch user boydwil max_*user_res.nodect (-2.0, 16.0), used 0.0
11/01/2021 10:53:22;0400;pbs_sched;Job;1129.cnetr-batch;Chunk: 1:ncpus=4:mem=10gb:Qlist=ia_cfd_q
11/01/2021 10:53:22;0400;pbs_sched;Job;1129.cnetr-batch;Found 0 out of 1 chunks needed
11/01/2021 10:53:22;0040;pbs_sched;Job;1129.cnetr-batch;Insufficient amount of resource: Qlist (ia_cfd_q != cfd_q,np_cfd_q)
11/01/2021 10:53:22;0040;pbs_sched;Job;1129.cnetr-batch;Job will never run with the resources currently configured in the complex
qmgr -c "l q ia_cfd_q"
Queue ia_cfd_q
queue_type = Execution
total_jobs = 0
state_count = Transit:0 Queued:0 Held:0 Waiting:0 Running:0 Exiting:0 Begun:0
max_queued = [o:PBS_ALL=4000]
acl_user_enable = True
from_route_only = False
resources_max.mem = 2600gb
resources_max.ncpus = 600
resources_default.mem = 40gb
resources_default.ncpus = 10
default_chunk.Qlist = ia_cfd_q
resources_available.mem = 2400gb
resources_available.ncpus = 600
max_run = [o:PBS_ALL=30]
max_run = [u:PBS_GENERIC=1]
enabled = True
started = True
qmgr -c "l n r08n30" nodes n31 thru n44 configured with same Qlist
Node r08n30
Mom = r08n30
Port = 15002
pbs_version = 2020.1.3.20210315160738
ntype = PBS
state = free
pcpus = 40
resources_available.arch = linux
resources_available.host = r08n30
resources_available.mem = 197407544kb
resources_available.ncpus = 40
resources_available.Qlist = cfd_q,np_cfd_q,ia_cfd_q
resources_available.vnode = r08n30
resources_assigned.accelerator_memory = 0kb
resources_assigned.hbmem = 0kb
resources_assigned.mem = 0kb
resources_assigned.naccelerators = 0
resources_assigned.ncpus = 0
resources_assigned.vmem = 0kb
resv_enable = True
sharing = default_shared
license = l
last_state_change_time = 1635770783
last_used_time = 1635769766