How to use FEKO with a job scheduling / queueing system?
Answers
-
It is recommended to use Altair PBS Works (http://www.pbsworks.com/AboutGT.aspx?d=About,-About-PBS-Works) with PBS Professional, PBS Compute Manager and PBS Display Manager be used. The details below is about general queuing system usage with FEKO.
FEKO integrates into various job scheduling and queuing systems such as Torque, PBS Pro, LSF, Parallelnavi NQS, SLURM, Oracle Grid Engine (formerly Sun Grid Engine), or LoadLeveler. This How-To provides some information around using such queuing systems and sample scripts for PBS and LSF.
Since FEKO Suite 5.4 (July 2008) direct integration of the above mentioned job scheduling and queuing systems is provided through Intel MPI. Advanced users might check the Intel MPI Getting Started and Reference Manuals that are shipped with FEKO for more information. Basic information is provided here only. For other platforms (e.g. SGI Altix which uses SGI MPT) similar job scripts are required, and below is some discussion for PBS and LSF.
The job scripts depend on the level of complexity (e.g. specify also the anticipated memory requirement, the expected run-time so that the job gets a SIGXCPU when exceeding the CPU time etc., all this is handled properly from within FEKO).
In the simplest case if users will submit the job script to the right queue (i.e. long job, short job etc.):
qsub job_script.sh
Inside the job script all that needs to be done is to call RUNFEKO (full path to application) with the --use-job-scheduler command line option to activate the queuing system integration with Intel MPI:
/opt/feko/bin/runfeko <filename> --use-job-scheduler
(the machine file - i.e. which nodes to be used - and the number of parallel processes etc. are all obtained from the queuing system in this case). Additional specifications in the job script like number of nodes to be used etc. should be done using the corresponding syntax of the queing system used, see also below for some sample job scripts.
When Intel MPI cannot be used (e.g. SGI Altix with SGI MPT), then FEKO can still be used with queuing systems. For instance for PBS the batch systems provides a file with the list of hosts and number of CPUs which can be accessed by the environment variable $PBS_NODEFILE. So all that needs to be done is to call the FEKO launcher from the job script with something like:
/opt/feko/bin/runfeko <filename> -np 16 --machines-file $PBS_NODEFILE
Note the 'filename' and the number of processes, e.g 16 being used here, can also be a parameter of the script.
For advanced usage, the user can make use of many features like merging stdout and stderr (recommended) into one file using
# merges stderr and stdout #QSUB-eo
or a time limit can be set with something like
#QSUB-lp_mpp_t=3600
and for checking / tracing purposes in particular with longer jobs it might also be useful to write stdout already while the request is being processed using something like
#QSUB-ro
These examples are generic PBS commands.
And as desired by the users, e-mail notifications can be sent when the job starts and finishes etc.
Attached below are three sample job scripts, one simple one using the --use-job-scheduler option and two more featured ones for instance for SGI MPT.
A final note that FEKO ships with a component QueuFEKO which automatically sets up job scripts based on user defined options and packs all the input files required for a run into an archive, which can also be encrypted for additional security. Please see the QueueFEKO description in the FEKO manuals.0