"Newbie: help with unsupervised anomaly detection with RapidMiner"
max001
New Altair Community Member
Hello,
After I managed to build a project doing data classification, I would like to ask for advise on how to build a project doing "unsupervised anomaly detection".
http://en.wikipedia.org/wiki/Anomaly_detection
I would appreciate a "pointer" to the right model to use, or tutorial on this topic - as a hint.
My problem... (with some simplifications):
I have a temperature sensor, reporting the data (temperature) every minute, for a length of 30 days - my "training data".
I have no idea whether in the history I view, there was any anomaly ("issue") related to the temperature, or when - just the data itself. So, the classification models aren't relevant, at least to my newbie level of understanding...
Then, I have a data for the temperature of the last one hour, reported by a minute.
My goal is to apply a reasonable heuristics, telling me the probability of that "hour" to represent an "anomaly", compared to the training data. Right now, I have some freedom to define "anomaly", but it should reflect real world scenarios like "too high", "too low", "too volatile", "too steady".
At the 2nd stage, I will need to analyze the information based on the days of week (assuming the temperature changes reflect some weekly "trends").
Thanks for any hint,
Max
After I managed to build a project doing data classification, I would like to ask for advise on how to build a project doing "unsupervised anomaly detection".
http://en.wikipedia.org/wiki/Anomaly_detection
I would appreciate a "pointer" to the right model to use, or tutorial on this topic - as a hint.
My problem... (with some simplifications):
I have a temperature sensor, reporting the data (temperature) every minute, for a length of 30 days - my "training data".
I have no idea whether in the history I view, there was any anomaly ("issue") related to the temperature, or when - just the data itself. So, the classification models aren't relevant, at least to my newbie level of understanding...
Then, I have a data for the temperature of the last one hour, reported by a minute.
My goal is to apply a reasonable heuristics, telling me the probability of that "hour" to represent an "anomaly", compared to the training data. Right now, I have some freedom to define "anomaly", but it should reflect real world scenarios like "too high", "too low", "too volatile", "too steady".
At the 2nd stage, I will need to analyze the information based on the days of week (assuming the temperature changes reflect some weekly "trends").
Thanks for any hint,
Max
Tagged:
0
Best Answer
-
Hi Max,
you should have a look at the Outlier operators, especially Outlier Detection (LOF). It calculates the Local Outlier Factor for each example, a numeric measure where high values indicate a higher probability for the example of being an outlier.
You can manually create a label which is true for all values above a certain threshold, and false otherwise. If you then create a descriptive model, e.g. a decision tree, which classifies the examples into true or false, you will know why the respective examples are outliers.
Best regards,
Marius1
Answers
-
Hi Max,
you should have a look at the Outlier operators, especially Outlier Detection (LOF). It calculates the Local Outlier Factor for each example, a numeric measure where high values indicate a higher probability for the example of being an outlier.
You can manually create a label which is true for all values above a certain threshold, and false otherwise. If you then create a descriptive model, e.g. a decision tree, which classifies the examples into true or false, you will know why the respective examples are outliers.
Best regards,
Marius1 -
Thanks a lot,
Max0