Model performance estimation
Dear All,
I have a relatively small dataset with 130 samples and 2150 attributes, and I want to built a classifier to predict 2 classes. Apparently, I need to reduce the number of attributes to avoid overfitting, so I could use i.e. RFE-SVM to reduce the number of attributes to 1 tenth of my samples, which is 13. I'm using a Logistic Regression model, and I need to do some fine tuning of parameters like lambda and alpha. After reading the very informative blog from Ingo, I would like some help on the practical implementation. May I kindly ask from a more experienced member to check the following workflow? Can I trust this implementation and in particular the performance estimates? Is it a good practice to compare the performance from CV with that from a hold-out single set? And if yes these numbers should be more or less the same?
Many thanks in advance,
npapan69
I have a relatively small dataset with 130 samples and 2150 attributes, and I want to built a classifier to predict 2 classes. Apparently, I need to reduce the number of attributes to avoid overfitting, so I could use i.e. RFE-SVM to reduce the number of attributes to 1 tenth of my samples, which is 13. I'm using a Logistic Regression model, and I need to do some fine tuning of parameters like lambda and alpha. After reading the very informative blog from Ingo, I would like some help on the practical implementation. May I kindly ask from a more experienced member to check the following workflow? Can I trust this implementation and in particular the performance estimates? Is it a good practice to compare the performance from CV with that from a hold-out single set? And if yes these numbers should be more or less the same?
Many thanks in advance,
npapan69