Hi,
We are trying to model revenue assurance predictive model in identifying the possible electricity theft. Our approach is to take the already known (theft meter hourly reads) and predict if any other meters follow similar usage patterns (anomalies and pattern matching to fraud).
The ratio is we have around 400 known theft meters and 110k unknown. As you can see we have very small ratio of known that we need to match up with unknowns(example set). I have tried KNN,GBT and Naive Bayes and tracking the performance using "Performance Binominal classification" (i.e.) LABEL=FRAUD =TRUE/FALSE. Also, Tried SVM as recommend by most research papers and its performance was terrible, trying parameter optimization and it is running from 2 days:-(
Below are my questions
(1) What would be the best supervised machine learning algorithms for these kind of prediction classifications?
(2) Also, how do we feed back the confirmed false positive meters as not theft to the model, so that model refines and start treating these as not theft and yields a better output(prediction)-Would appreciate if you can share a sample process on how to perform a feedback to model
Thx for the valuable input.