A program to recognize and reward our most engaged community members
reports01 wrote:As I have a dataset of which only a certain percentage has a know outcome (say: 'good' and 'bad'), there is a big group of which I do not know the outcome (say: 'rejects' in credit risk or 'indeterminates' in any other project).As I do want to incorporate this last group in my modelling (so my datasample has a complete view of the dataset), I would have to use a technique called: "reject inferencing", as by doing so I get a more trustworthy model that is based on my entire customer base.
reports01 wrote:But as I am using these cases to model on, I want a probability weighing on these cases of there likely behaviour, based on the variables used.
Thus creating a training set that is representive of the entire population. This is something that should be done in the premodeling stage.
So giving the known cases (good and bad) a weight of 100%and unknown cases a likelyhood weight of good behaviour between 0 and 100%
How do I do this in rapid-i
Now by using clustering techniques (looking at the known cases) the software should be able to infer a probability of the outcome being good (G) or bad (B) based on the variables which are known (age, income, loan, ... etcAfter which you can use the entire population to create a model on the outcome/infered outcome.
Thank you Steffen,
this tool had reject inferencing embedded in the pre-analysis step, after which the modeling started (using genetic algorithm)