Outlier detection algorithms comparison

zzM
zzM New Altair Community Member
edited November 5 in Community Q&A
Hello, I'm new to RapidMiner and I'm kinda having a little bit of troubles here.
I'm trying to compare outlier detection algorithms such as LOF LoOP in terms of performance... and I have no clue how to do it.

Best Answers

  • YYH
    YYH
    Altair Employee
    Answer ✓
    Hi @zzM,

    It is not possible to get the performance of unsupervised outlier detection, if we have no label for the ground truth.

    Check out this research paper for a comprehensive overview of the anomaly detection models which are available in anomaly detection extension 

    YY
  • Telcontar120
    Telcontar120 New Altair Community Member
    Answer ✓
    Without a binary classification problem that has a priori answers (the label) to which you are comparing a prediction (the score), it is not possible to produce the ROC/AUC performance metric.  So the only way to produce that would be to separately label all cases as to whether they were in fact outliers in your opinion based on whatever criteria you are using, and then treat the output from the different outlier algorithms as though they were predictive models.  This is the main difference between supervised and unsupervised machine learning problems, which is what @yyhuang was talking about before.  So the short answer to your question is "not unless you dramatically change the nature of the problem."

Answers

  • YYH
    YYH
    Altair Employee
    Answer ✓
    Hi @zzM,

    It is not possible to get the performance of unsupervised outlier detection, if we have no label for the ground truth.

    Check out this research paper for a comprehensive overview of the anomaly detection models which are available in anomaly detection extension 

    YY
  • zzM
    zzM New Altair Community Member
    Thank you for your answer @yyhuang, this research paper will surely help me a lot. 
    One more thing, is there a way to compare the outlier detection algorithms in terms of AUC as a performance measure?
  • Telcontar120
    Telcontar120 New Altair Community Member
    Answer ✓
    Without a binary classification problem that has a priori answers (the label) to which you are comparing a prediction (the score), it is not possible to produce the ROC/AUC performance metric.  So the only way to produce that would be to separately label all cases as to whether they were in fact outliers in your opinion based on whatever criteria you are using, and then treat the output from the different outlier algorithms as though they were predictive models.  This is the main difference between supervised and unsupervised machine learning problems, which is what @yyhuang was talking about before.  So the short answer to your question is "not unless you dramatically change the nature of the problem."