Daily checks and monitoring suggestions for Altair SLC Hub and Workers

Nico Chart_21517
Nico Chart_21517
Altair Employee
edited December 2023 in Altair RapidMiner

Some suggestions for your IT or Applications Support team to follow:

1. On the hub portal, check the status of the workers.

2. Check the hub services are running on hub and the worker service is running on the workers.

3. Run 'hubctl verify all' and look for any unexpected errors or warnings.

4. If you have configured WORK libraries away from the default setting (under the nomad temporary directories) then: Check that the 'cleanwork' utility is scheduled to run regularly (e.g. daily) to clean up redundant temporary files.

5. Utilize IT-style monitoring of machine statuses to watch for any excessive memory use, defunct processes, disk thrashing, etc.

6. Invoke the "Hello World" API and check you get a good response.

In addition it would be good to have a calendar reminders for checking:
- the expiry dates of your SSL certificates.
- the expiry dates of your software licenses (Altair sales accounting may remind you!)

Check that your backup schedule is performing backups according to your expectations. (And verify that you can restore from those backups.)

(You should backup both S3 object storage and PostgreSQL database.)

 

Tagged: