Fine tuning signalAI in Compose
Optimize model performance in Compose signalAI by hyperparameter tuning
In the previous article, we had discussed about how the end-to-end modelling process looks like in Compose “signalAI director” and also learnt about the algorithms available within the director. Picking up from where we have left, in today’s blog we are going to discuss about how to optimize the performance of the anomaly detection models present in the director by tuning the hyperparameters exposed to the us in the GUI.
Starting with isolation forest (IF), out of the parameters shown below , the attributes called “Estimator”, “Contamination” & “Max features” are typically the most influential ones. The parameter “Estimator” changes the number of trees that are fitted under the hood of IF depending upon the complexity of the problem at hand. Higher is the number of estimators, more the complex the fitted model is. For “contamination”, it’s a number by which we are letting isolation forest know how much percentage/portion of the input data points are contaminated. It’s a rough estimate/guess. In some use cases like Renishaw, we might get lucky and know the contamination factor beforehand, but if we don’t the default value of 0.1(10%) is a good place to start. On top of this we can also control the number of features to train the isolation forest’s base estimator using the “Max features” hyperparameter. It has a default value of 1 (100%) which means we are using all the parameters in the input data to train the estimator. The users can select any float value greater than zero for the “Max features” parameter to subsample the features present in the input dataset which will be used for fitting isolation forest.
In the context of local outlier factor (LOF), as shown below, by default LOF is set up for novelty detection (Outlier vs Novelty). But it can be also used for outlier detection by setting the hyperparameter “Novelty” to “False”. So, it’s a very important parameter to set before we start our analysis. Now when we use LOF for novelty detection, one useful trick here is to use a very small number (for example 0.001 or lower) for “Contamination” while fitting it on the input data, as for novelty detection the input data should ideally be free of anomalies during “training”. For outlier detection, the concept of “Contamination” here is exactly the same as isolation forest. On top of this, the hyperparameter called “Algorithm” also plays an important role in the context of overall fitting time of local outlier factor. Using this hyperparameter users can select which algorithm to use under the hood to compute the nearest neighbors. There are four options available here namely “BallTree”, “KDTree”, brute-force search & ‘auto’. Brute-force is typically the most expensive algorithm in terms of fitting time and the ‘auto’ option will attempt to decide the most appropriate algorithm based on the values passed to fit method. For the ease of use for the users ‘auto’ is the default value here.
Now for one class SVM (OSVM), it is typically sensitive to outliers and hence not very good for outlier detection, but it can be used for flagging outliers by fine-tuning the hyperparameter called “Nu” to handle outliers by preventing overfitting. Intuitively the parameter “Nu” is kind of equivalent to the “Contamination” hyperparameter we had discussed above. In addition to “Nu”, in terms of training time the parameter Tolerance for stopping criterion (“Tol” in GUI) plays an important role. This parameter controls the threshold value for stopping the iterative process running under the hood of OSVM for fast convergence. “Tol” has a default value of .001. The users can play with the default value to make the training process a bit faster.
Hyperparameter tuning is considered one of the most important steps in the machine learning pipeline and can enhance model performance to a great degree. Hopefully today’s article will help you to make some better decisions in choosing the optimal hyperparameters for the algorithms present in the signalAI director and hence reach better predictive accuracy.
Thanks for the read! Happy Learning!