Workflow: Inferring rejected loan applications with the Reject Inference block


The Reject Inference block enables you to address any inherent selection bias in a model by including a rejected population. The block is used to create an inference model and make predictions on loan default using input datasets ( accepted_loans.csvand rejected_loans.csv ) that contains observations describing a loan and the person taking the loan out, accepted_loans.csv contains a Default column where rejected_loans.csv does not.

The following demonstrates how to use the Reject Inference block to infer rejects into an existing logistic regression model:

  1. Import the accepted_loans.csv dataset onto a Workflow canvas using the Text File Import block.
  2. Expand the Model Training group in the Workflow palette, then click and drag a WoE Transform block onto the Workflow canvas.
  3. Click the accepted_loans.csv dataset block's Output port and drag a connection towards the Input port of the WoE Transform block.
  4. Double-click the WoE Transform block to display the WoE Transform Editor view.
  5. In the WoE Transform Editor view:
    1. From the Dependent variable drop-down list, select Default.
    2. From the Target Category drop-down list, select 1 (one).
    3. From the Independent Variables list, click on the drop-down boxes under Treatment and set Other_Debt to Interval, Income to Interval, Sector to Nominal, Age to Interval and Housing_Situation to Nominal.
    4. Click the Optimisation tab, click Apply optimal binning to all variables.

    5. Press Ctrl+S to save the configuration and close the WoE Transform Editor view. A green execution status is displayed in the Output port of the WoE Transform block and the dataset that applies the transformation, the Working Dataset.
  6. Click and drag a Logistic Regression block onto the Workflow canvas, beneath the WoE Transform block.
  7. Click on the WoE Transform block Working Dataset's Output port and drag a connection towards the Input port of the Logistic Regression block.
  8. Double click the Logistic Regression block to display the Configure Logistic Regression dialog box.
  9. In the Configure Logistic Regression dialog box:
    1. From the Dependent variable drop-down list, select Default.
    2. From the Event drop-down list, select 1.
    3. From the Unselected Effect Variables list, double click the variables Other_Debt_WOE, Income_WOE, Interval_WOE, Sector_WOE, Age_WOE and Housing_Situation_WOE to move them to the Selected Effect Variables list.
    4. In the Selected Effect Variables list, clear the Class checkbox for each variable.
    5. Click the Model Selection tab, from the Method drop-down list, select Forward.

    6. Click OK to close the Configure Logistic Regression dialog box.
  10. Expand the Scoring group in the Workflow palette, then click and drag a Score block onto the Workflow canvas, below the rejected_loans.csv
  11. Click the WoE Transform block WoE Transform and rejected_loans.csv dataset's Output port and drag a connection towards the Input port of the Score block.

  12. Right-click the Score block's output dataset block and click Rename. Type Scored Rejects and click OK.
  13. Click and drag a Reject Inference block onto the Workflow canvas, below the Logistic Regression block and the Score block.
  14. Click the Logistic Regression block Logistic Regression Model and Scored Rejects dataset's Output port and drag the connections towards the Input port of the Reject Inference block.

  15. Double click the Reject Inference block to display the Configure Reject Inference dialog box.
  16. In the Configure Reject Inference dialog box:
    1. From the Inference Methos drop-down list, select Proportional Assignment.
    2. From the Bad Event drop-down list, select 1 (one).
  17. Click OK to save the configuration the close the Configure Reject Inference dialog box.

A green execution status is displayed in the Output ports of the Reject Inference block and of its outputs Inference Model, Inference Report and Inferred Rejects. You now have an inference model that considers both accepted and rejected loan applications.