Altair slc chapter II identifying the best five logistic models
Too long to post here
see
https://github.com/ro, see github gerjdeangelis/utl-altair-slc-chapter-II-identifying-the-best-five-logistic-models
PREP
1 you need this statement in your autoexwc
libname workx "d:/wpswrkx";
2 You need to copy macro utl_mdlgetpos.sas to your autocall library.
It is in theis repo.
CONTENTS (all the exel files are in this repo)
1 Top Chis
d:\lgs\xls\lgs_mgmtopchivalues.xlsx
2 Top odds ratios
d:\lgs\xls\lgs_mgmtopoddsvalues.xlsx
3 VIF Variance Inflation Factor
4 Remove high vif and recheck
d:/lgs/xls/lgs_stepselect.xlsx
5 Stepwise logistic
6 Run all possible model (136,629)
7 Determine the best number of predictors
8 select the top 5 models with 5 predictors (Top candidate was also suggested by stepwise)
9 At least two more chapters
INPUTS (outputs of chapter I)
see chapter I
https://github.com/rogerjdeangelis/utl-altair-slc-chapter-I-optimum-binning-in-preparation-for-logistic-regression
DESCRIPTION OBS TABLE COMMENT
- RAW TRAINING (no binning) 70,000 d/:lgs/lgs_rawTrain.sas7bdat
- RAW HOLDOUT (no binning) 30,000 d/:lgs/lgs_rawHold.sas7bdat
- LOGISTIC INPUT BINNED 70,000 d/:lgs/lgs_MgmAllChrNum.sas7bdat
- NORMAILIZED CHAR BINNED 980,000 d:/lgs/lgs_mgmNrmChr.sas7bdat 14 vars*70000 observation 980,000
- NORMAILIZED NUM BINNED 910,000 d:/lgs/lgs_mgmNrmNumSrt.sas7bdat 13 vars*70000 observation 910,000
6 CHR EXCEL REPORT INPUT 40 d:/lgs/lgs_mgmChrCutRpt 14 vars 40 obs
7 NUM EXCEL REPORT INPUT 46 d:/lgs/lgs_mgmNumCut 13 vars 46 obs
OUTPUTS
1 Top 100 4-7 predictor models 2,200 d:/lgs/lgs_posmgmmodnrm.sas7bdat 4100 + 5100 + 6100 + 7100 = 2,200