Classifier Accuracy with Grid Search is not similar to accuracy without Grid Search

Hello guys I'm doing Grid Search for tuning Random Forest Parameters when the process ends it gives me a set of best parameters also the accuracy of the best parameters for RF, now my question is when I run the process without Grid Search by setting Random Forest parameters that i got from Grid Search I notice I get a downgrade accuracy??? Can anyone explain the difference because both approaches are the same the only difference is that the first approach is with Grid Search and the second time without Grid Search?
I have includes screenshots of my process

my dataset is Glass Type with 214 samples it contains 1 duplicate row, 6 class Unbalance Data, I run my process as following
send dataset into Optimize Parameters (Grid) operator
inside Optimize Parameters (Grid) operator:
1- remove duplicates
2- Normalize
3- split Data into 80:20
4- use Smote on Training data only
5- Train RF
6- Evaluate Model

Find more posts tagged with

AI Studio

Accepted answers

All comments

Caperez

Hi @Safa,

It's a abnormal behavior if you are using the same datasets, be sure that it's the case.

for example, I saw that you are using the split operator, depending on the parameters, the datasets (training and test) may vary.

Try the process with stable train and test datasets and check it.

Best

Safa

Hi @ceaperez i have set the split operator in stratification mode and split 80:20 both time, correct me if I'm wrong i think the split operator give same 80% for traininig and same 20% for testing in both cases??

Caperez

Hi @Safa,
The stratified sampling create random subsets.
I suggest you to use the split operator once, store the results and then use the new examplesets into your comparison.

Best

Cesar

Safa

Hi @ceaperez
I did as you said and split the data then store the results into two separate files.
After that, I run Grid Search and get the best parameters and accuracy.
Then I test without grid search but still, I get a downgrade accuracy??
please check my screenshots and tell me if I'm doing something wrong??

Image: https://us.v-cdn.net/6030995/uploads/editor/ig/hmaxfsluzwf8.png

Image: https://us.v-cdn.net/6030995/uploads/editor/i0/efvwm7db0l7x.png

Caperez

Hi @Safa,

One of the most beautiful things about Rapidminer is that you have a whole view of your pipeline and you can explore your model step by step.

I saw in your model that the accuracy is more like now than before. that is because we eliminated one source of aleatority.

The Smote operator is another one. if you use the Smote operator over the same dataset twice, you will not obtain the same dataset.

I invite you to explore your model using the pipeline, breakpoints and the compare distributions operator from smile extension.

Best,

Cesar

Safa

Hi @ceaperez thank you for answering my question really appreciated i have learn few thing from you thanks.
I have used smote only once,
I have removed smote too and test again without using split operator still I get downgrade accuracy, I think using performance operator inside grid search and without grid search make slightly different result anyhow thanks
best regards