AI Powered Outlier and Novelty Detection
“Data is the new oil”. Let’s use it to detect outliers & novelties!
As a 90’s kid, “Men in Black” is one of my favorite sci-fi movies and one of its scenes that cracks me up no matter how many times I watch it is when Will Smith shoots the little girl Tiffany’s caricature during a field test while interviewing. Now if we think about it, in one sense what Will Smith is doing here is basically finding the odd one out, aka anomaly detection, and that’s what I am going to discuss today. Within the domains of data analytics & AI, under the fame & glare of all the new neural networks, the field of anomaly detection sometimes kind of loses the attention it should get otherwise, so let’s dig a bit deeper and learn what are some of the common techniques used for anomaly detection, how the methods differ from each other, and in what scenarios we should use what models.
The field of anomaly detection mainly consists of two types of sub domains, namely outlier detection & novelty detection. Though it’s very easy to get confused between these two terms and sometimes people mix them up causally, they are quite different, technically speaking. Outlier detection is an unsupervised learning method which tries to find abnormalities in the input data. As it’s a completely unsupervised method there is no “training phase” as such and it detects outliers as we feed the models with samples as they come in. On the contrary, novelty detection is a semi-supervised method which first tries to learn the signatures/patterns of the training data and then aims to determine if a new data point is an outlier or not. As it’s a semi-supervised analysis, in the first phase we train our algorithm with data in hand which is ideally not polluted by outliers, and then when a new observation comes in the model is capable of determining if the new sample is an outlier or not by using the trained model. In this context, an outlier is called a novelty.
Now that we have discussed the distinction between outlier and novelty detection, let’s go over some of the most common data-driven techniques used for them. Two very efficient algorithms to perform both outlier and novelty detection in high and moderately high dimensional data are Isolation Forest (IF) and Local Outlier Factor (LOF) respectively. IF is a tree-based method which ‘isolates’ observations as anomalies by measuring the path lengths from the root node during random partitioning. On the other hand, LOF flags anomalies by observing local density of a sample relative to the global density of the entire data set. Along with these two approaches, support vector machines based One Class SVM (OSVM) is another very effective tool for anomaly detection which identifies anomalies depending on where a sample lies with respect to the learned hypersphere frontier by the support vectors. OSVM is typically very sensitive to outliers and hence not very good for outlier detection, but more suited for novelty detection purposes where the training data is not much polluted with outliers.
If you have reached this far I am guessing by now you are a bit more familiar with the field of anomaly detection along with some of the most useful as well as commonly used algorithms in this domain and hopefully you will be more confident when the terms like anomaly detection, outlier detection and novelty detection are thrown around next time. Now that we have covered the theory, I am pretty sure like me the engineer in you might be now looking for applying these techniques within Altair product line and expanding on that to use them in real-life applications to get our hands dirty. Well, I have good news! One of our solutions called signalAI is actually designed to tackle this vary field of anomaly detection and under its umbrella all three algorithms discussed above are now available off the shelf with Knowledge Studio (2021.3) and as a library within Compose. In terms of use cases, please feel free to give this article a read on how we have used signalAI to monitor the health status of bearings running in the field in real-time and along with this on the client adoption front, Renishaw has recently used signalAI to detect anomalies in their additive manufacturing processes. Please stay tuned for the next article to know more about this customer success story.
Thanks for the read! Happy Learning!
Comments
-
This is an awesome blog! Thanks for sharing.
0 -
Very nice blog Angkit, I love the introduction! Thanks a lot for sharing this information.
0