using clustering to check for fraud
Hi,
I am trying to detect expense claim fraud using rapidminer. I am not too sure what is the suitable modelling technique, thus I tried out k-mean clustering.
I have a huge data containing the following attributes, basically only amount is numeric and from my understanding k-mean can only use to analyze numeric.
- date
- employee
- amount
- expense type
etc
I have done the process and output as below: Basically, I just filter one employee at a time and select the amount attribute.
Qn: How can I analyze from the output to detect if there is any fraud claim?
Thanks.
Fraud is always a great use case but it can be tricky to find them. Have you tried the Anomaly Detection extension? They have a great HBOS score operator.