Altair slc chapter I optimum binning in preparation for logistic regression
Too long to post on a listserve, see github
github
https://github.com/rogerjdeangelis/utl-altair-slc-chapter-I-optimum-binning-in-preparation-for-logistic-regression
Key output reports
Binned character covariates excel report
https://github.com/rogerjdeangelis/utl-altair-slc-chapter-I-optimum-binning-in-preparation-for-logistic-regression/blob/main/lgs_MgmChrCut.xlsx
Binned numeric covariates excel report
https://github.com/rogerjdeangelis/utl-altair-slc-chapter-I-optimum-binning-in-preparation-for-logistic-regression/blob/main/lgs_MgmNumCut.xlsx
SECTIONS
I ANALYSIS OVERVIEW
II CAMPAIGN STRATEGY
III KEY OUTPUT SAMPLE OF BINNING ONE COVARIATE
IV INTERPRETING BINNED DATA
V INPUT
VI OUTPUT FIVE TABLES
VII THEORY
VIII METHOD (program
1 Contents of raw input data
2 VALIDATE AND VERIFY RAW DATA. FIX ISSUES WITH RAW DATA
3 One missing vale
Drop one to one variable. Very low caedinality variables.
Convert the many missing variable formats to just '?'
4 Optimize variable length to the longest in the data
5 Create Holdout and Training Tables
6. Bin the character variables in groups with common odds ratios
Create normalized (long and skinny binned data wirh just 55 variables)
Create denormalized (wide binned table with 30 variables)
7 Create excel character data summary bin report with chi square and Mantel–Haenszel Stats
8 Bin the numeric variables in groups with common odds ratios
Create normalized (long and skinny binned table wirh just 55 variables)
Create denormalized (wide binned table with 38 variables)
9 Create excel numeric data summary bin report with chi square and Mantel–Haenszel Stats
10 Join raw training table, numeric binned data with character binned data for analys
IX CONTENTS OF FUTURE CHAPTERS
Logistic diagnosis and related reports
Fitting logistic regression on training data
lgs_mgmFinalLogisticDiag Logistic Model (Diagnostics)
lgs_MgmGainsChart Gains Chart
lgs_mgmTopChiValues Most Influential Variables
lgs_mgmTopIndexValues Highest Response Variables
lgs_MgmTopTen List of Top 12 scores
lgs_MgmVenn n Comparison of covariate contribution to top pcile
Final pdf presentation of results (slidedeck)