Does SLC support system option FILELOCKWAIT?

Nico Chart_21517
Nico Chart_21517
Altair Employee
edited December 19 in Altair RapidMiner

Altair SLC does not recognise the SAS language system option FILELOCKWAIT, however system option TRANSACTEDFILEQUENCHDELAY is available. For example you can set TRANSACTEDFILEQUENCHDELAY=6000 and this effectively does the same thing as FILELOCKWAIT=6, the only difference is that FILELOCKWAIT take a time in seconds, whereas TRANSACTEDFILEQUENCHDELAY takes a time is milliseconds.

Here's how it works:

SLC serialises and coordinates multiple processes accessing data in a concurrent execution environment, either on a single machine or across multiple machines accessing a shared file system. This activity is controlled using file level locking and a number of control and guard files that are used to coordinate different file activities. TRANSACTEDFILELOCKINGBLOCKS and TRANSACTEDFILEQUENCHDELAY are doing two separate things.

TRANSACTEDFILEQUENCHDELAY is the length of time that SLC will wait once it has opened the guard file during a read operation, that is used to synchronise activities occurring as a dataset or catalog is accessed. When committing a new version of a dataset or opening a dataset for in-place modification we use another file as a sentinel to indicate that an activity is about to occur and is in progress. The modifier of the dataset obtains an exclusive write lock on this file and holds it for the duration of the activity with the file. Readers obtain a shared read lock on this file before they open the file for read. The quench delay is applied to the reader between opening this guard file and requesting the read lock, it gives a prospective writer a chance to obtain a write lock and hold the readers off so that the write can occur.

This control has been put in place to address issues of write lock starvation that has been experienced by some customers where multiple reading processes access the same data file with another write process pushing updates into the file. Depending on how the file locking queue is being managed by the operating system there is a possibility readers will be given priority over writers, this is a well documented scenario in the literature, some operating systems overcome this by implementing a lock request queue with request age priority weighting whereas others just assume that reads are quick and always get served first leading to write starvation. Users experiencing this can alter the value of TRANSACTEDFILEQUENCHDELAY in order to artificially provide some space for the write lock request to be serviced, at the cost of making read activities take longer to start. It is not a control that should be used unless the problem of write starvation is encountered, and when no operating system control can be enacted to resolve the problem. If this is the case then a suitable quench time will need to be discovered for the site that allows the write to occur amid the deluge of read requests.

This is not something that should be experienced by sites in normal use.

TRANSACTEDFILELOCKINGBLOCKS is a control that changes the way that SLC interacts with file locking when accessing datasets and catalog. By default SLC does not block waiting to be able to get a lock, it uses a back off and try again process with an escalating delay. This will eventually cause the activity to fail if the required file opening cannot be achieved within a number of iterations of the back-off-and-retry delay period, where the delay gets longer with each iteration.

When TRANSACTEDFILELOCKINGBLOCKS is in effect rather than performing the back off and try again SLC will stop and wait until the lock request can be satisfied by the operating system. This will be an indeterminate length of time and SLC will only proceed when the lock request it has made is granted by the operating system.

The default behaviour is useful for interactive SLC usage, where a failure message is preferable to an indeterminate wait. This only matters if SLC is interacting with a dataset or catalog that is in high demand and contended by multiple separate users - so may suffer from write starvation or may be in use by some long running activity. With TRANSACTEDFILELOCKINGBLOCKS then a long running process that is updating a SLC dataset using modify-in-place will cause another process that is waiting to read that dataset to block and only resume execution once the file is available. If NOTRANSACTEDFILELOCKINGBLOCKS is in effect then the second process may time out whilst it is waiting and produce an error, failing to perform the operation it is trying to do.

Tagged: