A Visual Introduction to Novelty Detection
Detecting that something is different is easier to understand with this example
The detection of anomalies plays a surprisingly common role in engineering data science. The need to identify that some data is not like the others is evident in design exploration regressions, monitoring additive manufacturing processes, or geometry recognition. Over an extended time with the subject, technical concepts like outliers and novelties have become second nature to me, but if I am honest, I admit that initially I struggled with the distinction between the two terms. Can an outlier be a novelty? Aren’t they the same? I finally truly understood the difference via a simple example, and I’ll share it here through a conceptual application of Altair’s shapeAI technology.
To begin, image a dataset that consists of only four geometries: three cars and one airplane wing. The first question we might ask of this dataset is “are there outliers within the data?”. This is an inward inquiry to the data itself, a form a descriptive statistics. This situation is illustrated in the image below.
Clearly, the airplane wing is a unique geometry unlike any other datum, so it is considered an outlier. The result may seem obvious, but it begs the next thought: “if the wing is an outlier, would it also be a novelty?”. Perhaps counterintuitively, the answer is no. The same wing would not be a novelty, as diagramed here.
In contrast to the inward-looking context of an outlier, a novelty is more like an outward-looking assessment that requires a new input to make a prediction. Given our dataset, the wing itself is not a novelty because it is similar to datum already in the dataset. In the diagram above, the input geometry is a trivial match the wing already in the dataset, but the same idea would hold true even a different wing was used as input to the predictor, as shown here.
In either case, the novelty prediction model decides “I’ve seen something like this in the dataset” – it is not new. For the wing to be considered a novelty, it would have to be removed from the original dataset, as shown below.
I find the above imagery a good tool to remember the difference between outliers and novelties. Although they share some similarities, they are distinct ideas within the framework of anomaly detection. In addition to serving unique purposes, the same geometry can be one yet not the other. I hope this information helps you clarify the concepts and I look forward to seeing anomaly detection play an increasingly larger role in engineering data science.
Comments
-
Outlier detection is unsupervised ML. What about novelty detection? Is it also unsupervised ML? Can most outlier detection algorithms also work for novelty detection and vice versa?
0 -
Novelty detection comes under the semi-supervised category as it requires to get trained with "good" data points at first before classifying incoming samples as novelty vs non-novelty during inference.
Most of the novelty detection algorithms can be used for outlier detection too with some hyperparameter tuning but the converse is not typically true.
2