🎉Community Raffle - Win $25

An exclusive raffle opportunity for active members like you! Complete your profile, answer questions and get your first accepted badge to enter the raffle.
Join and Win

"Newbie: help with unsupervised anomaly detection with RapidMiner"

User: "max001"
New Altair Community Member
Updated by Jocelyn
Hello,

After I managed to build a project doing data classification, I would like to ask for advise on how to build a project doing "unsupervised anomaly detection".
http://en.wikipedia.org/wiki/Anomaly_detection

I would appreciate a "pointer" to the right model to use, or tutorial on this topic - as a hint.

My problem... (with some simplifications):

I have a temperature sensor, reporting the data (temperature) every minute, for a length of 30 days - my "training data".

I have no idea whether in the history I view, there was any anomaly ("issue") related to the temperature, or when - just the data itself. So, the classification models aren't relevant, at least to my newbie level of understanding...

Then, I have a data for the temperature of the last one hour, reported by a minute.

My goal is to apply a reasonable heuristics, telling me the probability of that "hour" to represent an "anomaly", compared to the training data. Right now, I have some freedom to define "anomaly", but it should reflect real world scenarios like "too high", "too low", "too volatile", "too steady".

At the 2nd stage, I will need to analyze the information based on the days of week (assuming the temperature changes reflect some weekly "trends").

Thanks for any hint,

Max


Find more posts tagged with

Sort by:
1 - 2 of 21
    User: "MariusHelf"
    New Altair Community Member
    Accepted Answer
    Hi Max,

    you should have a look at the Outlier operators, especially Outlier Detection (LOF). It calculates the Local Outlier Factor for each example, a numeric measure where high values indicate a higher probability for the example of being an outlier.
    You can manually create a label which is true for all values above a certain threshold, and false otherwise. If you then create a descriptive model, e.g. a decision tree, which classifies the examples into true or false, you will know why the respective examples are outliers.

    Best regards,
    Marius
    User: "max001"
    New Altair Community Member
    OP
    Thanks a lot,
    Max