Digital Twin Driven Anomaly Detection
Detecting anomalies with synthetic data
With rising prices on almost every product currently, any type of cost reduction is always beneficial. One of those cost saving areas where manufactures are increasingly looking into is data driven methods to detect when their products might fail or need preventative maintenance. This can help them in not only reducing cost on warranty repairs but also making their products more robust overall as a byproduct. However, to train these models historical data representing both “good” and “bad” states of a system is required which most of time are not readily available. That is where synthetic data can make a real difference and in today’s article we are going to discuss a use case where we have used Altair Inspire and signalAI to build a synthetic data driven anomaly detection use case in real life.
For this project we decided to center our data around detecting when bearings in a motor system are reaching the end of their life as well as if the overall system itself is behaving in a faulty state. To determine this, we decided to take acceleration measurements from a set of bearings that are inserted into pillow blocks and connected with a linear shaft which is rotated by a motor. Once we zeroed in on the design of the system, Inspire is used to import the CAD model consisting of the parts mentioned above. Once this is done, we defined the contacts, flex bodies, and the forces which can be easily added to the Inspire model. Here we simulated two types of anomalies/failures which are commonly observed in bearing system, namely offset load on the shaft and low lubrication resulting in internal wearing. This data along with the baseline which represents normal condition is exported as .csv files which are then imported into signalAI for further down-stream analytics.
Now with signalAI, we decided to build two types of analytical model using these .csv files. In the first instance, we have treated the problem as an outlier detection use case which is purely unsupervised. Here we intentionally polluted the baseline data with 10% of the anomalous data and fed into the outlier detection algorithms in the signalAI node. In this case, we wanted to see how many of that 10% of the bad samples signalAI can identify as anomalous. In the other instance, we approached it as a novelty detection problem, where we first trained our models with the baseline data and then used the trained model to make inference on the anomalous data to check if the algorithms can successfully flag them as anomalous instances. In both cases, the models showed a very good accuracies in the orders of more than 98%.
So to conclude here, to gather data representing both “good” and “bad” state of a system either we have to wait until the physical system develops some kind of wear or we have to intentionally mess parameters up to generate anomalous behavior, which often can result in damaging the system itself. In both cases it’s an expensive process, in the first case it’s expensive in terms of time and in the second case in terms of monetary cost. Using simulated data in these use cases can help us to generate required data in a fraction of time and that too without damaging the actual system. On top of this, with simulation we can generate data for different types of specific failure cases which then can be used in developing real time fault detection models. Of course, what we had showcased here is relatively a simpler version of a digital twin, but if we think about it, we can build more sophisticated ones by just adding more details into the simulation process.
Thanks for the read, happy learning!