Productionizing Anomaly Detection using signalAI & Panopticon
Building end to end anomaly detection pipelines from model development to deployment.
In today’s age everything is connected to the internet from our phones to the light switches. With all these devices comes a stream of constant data containing valuable information and with suitable analytics depending upon the use case, different types of models are being built which brings a ton of value to the end user. In building these types of applications there are typically two steps involved. Step one is modelling followed by deployment of the best model generated during modelling trials. In many occasions as data scientists we (including myself) are often so focused on fine tuning our models that we tend to not give enough attention to the deployment part. Now for an academic setting or a Kaggle competition that’s perfectly fine, but for a commercial/industry application deployment of the developed model is a very important part of the workflow, sometimes even more important than developing the model itself. In today’s article we will discuss how with signalAI and Panopticon, we can build the whole end to end pipeline of anomaly detection.
We start our journey with an offline data set available at hand which contains whole host of sensor readings like tire pressure & temperatures, brake temperatures, wheel speed etc for a simulated F1 race car. The goal here is to build an analytics model which can give us a handle on the current tire wear by consuming these readings. The tire wear has values between 0 and 11, where 0 represents no wear and 11 represents a very heavy wear. Now as we all know signalAI performs unsupervised anomaly detection and hence we can’t predict the discrete values of the wear itself, but instead what we would like to see here is for example tires with wear score of let’s say 10 or 11 marked as anomalous compared to the 0 or 1 level of wear. We played with all the three algorithms in signalAI and isolation forest was able to successfully pick the anomalous tire wears with a high confidence score.
Now that we have a trained model with good performance, we can start thinking of deploying of this model as the end of the day we would like to use this model for inferencing on the real-time data and that’s where Panopticon comes into the picture. Panopticon is a web-based application specialized in displaying real-time data in a visually pleasing way. However, with the ability to interface with python objects, Panopticon can also be used to the make inference on a data stream with a .py model enabled by the “python transform” tool inside Panopticon. Leveraging this capability, we embedded our trained .py model with the optimal hyperparameters extracted out of Knowledge Studio's signalAI node made inference on the incoming data streams. Now for consuming incoming data, Panopticon offers many options and because of the way it handles each data structure, we don't need to change the inference model unless the data scheme itself changes. Hence we were able to make inference on data coming from a MQTT streaming device as well as a SQL database perfectly fine without making any changes to the underlying python model.
Needless to say, with more sensor readings, we can build models bringing more value in the context of modelling, but as showcased above, because of the ability to easily export a python model out of Knowledge Studio’s signalAI and subsequently importing that particular model into Panopticon seamlessly, we are able to quickly build end to end pipelines to monitor systems in real-time within Altair product line itself.
Thanks for the read, happy learning!