Altair RISE

A program to recognize and reward our most engaged community members

Nominate Yourself Now!

First steps. Need help in clustering

hi,

I create a fictious dataset using Excel RANDBETWEEN function. The dataset is composed of 18000 rows and two columns. Columns A contains IDs with values ranging between 1 and 100. Column B contains an hypothetical expense amount between 0 and 50000 for each ID numbers except for ID number 100 whose column B corresponding expense range is narrower and comprised between 48000 and 50000.

Let’s suppose I don’t know how the dataset is composed and I’d wanted to see it there is one ore more IDs with anomaly concentration (I mean I would like the analysis to spot ID number 100 with its concentration between 480000 and 50000), what kind of analysis I should perform? I tried with clustering (k-means), but without success; probably I do not know the steps to follow to perform the analysis. Might somebody help me?

Find more posts tagged with

AI Studio

k-Means Clustering

Getting Started

Accepted answers

Telcontar120

Try some of the operators in the anomaly detection methods available in the free extension of that name. LOF might be particularly useful in this type of context.

All comments

Telcontar120

Try some of the operators in the anomaly detection methods available in the free extension of that name. LOF might be particularly useful in this type of context.

Antonios1

Thanks for helping Brian. I am really new at Rapidminer and AI, so forgive me if I do not use the relevant terms. Anyway, I am sorry I was unable to test the LOF operator. I downoload the anomaly detection extension and used the LOF operator. I connected my file through the out port to the exe port on the LOF operator and connected the exa operator port to the res port. The process seemed to take a lot of time to give an output so I stopped it after a few hours, I run it again this morning before going to work and once back at one, I found the software crashed. I have launched it again to see how it proceed. Now it has been running for about 1 hour and still going. Pc is an i7 with 16GB Ram.

Antonios1

Thank you, Brian. It works. I had the possibility to run the operator on a different pc and it worked correctly. It also seems to be quite immediate to interpret the result..