First steps. Need help in clustering

hi,
I create a fictious dataset using Excel RANDBETWEEN function. The dataset is composed of 18000 rows and two columns. Columns A contains IDs with values ranging between 1 and 100. Column B contains an hypothetical expense amount between 0 and 50000 for each ID numbers except for ID number 100 whose column B corresponding expense range is narrower and comprised between 48000 and 50000.
Let’s suppose I don’t know how the dataset is composed and I’d wanted to see it there is one ore more IDs with anomaly concentration (I mean I would like the analysis to spot ID number 100 with its concentration between 480000 and 50000), what kind of analysis I should perform? I tried with clustering (k-means), but without success; probably I do not know the steps to follow to perform the analysis. Might somebody help me?
Best Answer
-
Try some of the operators in the anomaly detection methods available in the free extension of that name. LOF might be particularly useful in this type of context.1
Answers
-
Try some of the operators in the anomaly detection methods available in the free extension of that name. LOF might be particularly useful in this type of context.1
-
Thanks for helping Brian. I am really new at Rapidminer and AI, so forgive me if I do not use the relevant terms. Anyway, I am sorry I was unable to test the LOF operator. I downoload the anomaly detection extension and used the LOF operator. I connected my file through the out port to the exe port on the LOF operator and connected the exa operator port to the res port. The process seemed to take a lot of time to give an output so I stopped it after a few hours, I run it again this morning before going to work and once back at one, I found the software crashed. I have launched it again to see how it proceed. Now it has been running for about 1 hour and still going. Pc is an i7 with 16GB Ram.
0 -
Thank you, Brian. It works. I had the possibility to run the operator on a different pc and it worked correctly. It also seems to be quite immediate to interpret the result..
0