hi,
I create a
fictious dataset using Excel RANDBETWEEN function. The dataset is composed of 18000
rows and two columns. Columns A contains IDs with values ranging between 1 and 100.
Column B contains an hypothetical expense amount between 0 and 50000 for each
ID numbers except for ID number 100 whose column B corresponding expense range is narrower and comprised between 48000 and 50000.
Let’s
suppose I don’t know how the dataset is composed and I’d wanted to see it there
is one ore more IDs with anomaly concentration (I mean I would like the
analysis to spot ID number 100 with its concentration between 480000 and 50000),
what kind of analysis I should perform? I tried with clustering (k-means), but without success; probably I do not know the steps to follow to perform the analysis.
Might somebody help me?