PBS Pro Job is Getting Stuck in a Wait (W) state. How Do I Run My Job Now?


In PBS Pro, the W state job is an error state which occurs when the job input files don't get transferred to the compute node before actually running the job. 

The job is actually scheduled onto a node and would be seen in R state but before actually starting the job process execution on the compute node, it needs to copy the required input files for the job. If the job fails, it would eventually change its state from 'R' to 'W' and will remain in that state.

Reason for it could be:

1. Password less scp for the job owner from compute node to the head node is not working.

2. The destination directory on the compute node doesn't have ample disk space available.

3. The destination directory doesn't have the proper permission to allow the copy of the input files.

4. Input file name contains some spaces or some special characters.


How to fix?

After fixing the root cause mentioned in point no. 1 to 4:

1. You may delete the job and resubmit it afresh.

or

2. If you don't want to resubmit the job and want the same job to run, then you may have to restart the pbs_server process. The job state would turn to Q state and then would finally run.