First steps. Need help in clustering

Antonios1
Antonios1 New Altair Community Member
edited November 2024 in Community Q&A

hi,

I create a fictious dataset using Excel RANDBETWEEN function. The dataset is composed of 18000 rows and two columns. Columns A contains IDs  with values ranging between 1 and 100. Column B contains an hypothetical expense amount between 0 and 50000 for each ID numbers except for ID number 100 whose column B corresponding expense range is narrower and comprised between  48000 and 50000.

Let’s suppose I don’t know how the dataset is composed and I’d wanted to see it there is one ore more IDs with anomaly concentration (I mean I would like the analysis to spot ID number 100 with its concentration between 480000 and 50000), what kind of analysis I should perform? I tried with clustering (k-means),  but without success; probably I do not know the steps to follow to perform the analysis. Might somebody help me?

Best Answer

  • Telcontar120
    Telcontar120 New Altair Community Member
    Answer ✓
    Try some of the operators in the anomaly detection methods available in the free extension of that name.  LOF might be particularly useful in this type of context. 

Answers

  • Telcontar120
    Telcontar120 New Altair Community Member
    Answer ✓
    Try some of the operators in the anomaly detection methods available in the free extension of that name.  LOF might be particularly useful in this type of context. 
  • Antonios1
    Antonios1 New Altair Community Member
    Thanks for helping Brian. I am really new at Rapidminer and AI, so forgive me if I do not use the relevant terms. Anyway, I am sorry I was unable  to test the LOF operator. I downoload the anomaly detection extension and used the LOF operator. I connected my file through the  out port to  the exe port on the LOF operator and connected the exa operator port to the res port. The process seemed  to take a lot of time to give an output so I stopped it after a few hours, I run it again this morning before going to work and  once back at one, I found the software crashed. I have launched it again to see how it proceed. Now it has been running for about 1 hour and still going. Pc is an i7 with  16GB Ram.


  • Antonios1
    Antonios1 New Altair Community Member
    Thank you, Brian. It works. I had the possibility to run the operator on a different pc and it worked correctly. It also seems to be quite immediate to interpret the result..