adaboost individual model performance

Thiru
Thiru New Altair Community Member
edited November 5 in Community Q&A
Im using adaboost  + KNN for my data, which gives performance accuracy of 77.24.  & precision, recall. 
 Adaboost is configured with 10 iterations. 
 is there any way to view the performance of model in each iteration and weights assigned in successive iterations
in rapidminer? 
pl let me know.   thanks

regds
thiru

Best Answer

  • varunm1
    varunm1 New Altair Community Member
    edited April 2020 Answer ✓
    Hello @Thiru

    1. Adaboost will try to improve an algorithm by taking misclassified samples in each iteration to build a classifier. So, this works on training side. The outcome of this training is an ensemble of decision trees, that are applied on testing data to check how well the trained algorithm performed. So Adaboost_1 to 10 are training performances, you can see the trained model is improving based on performances. But testing performance is only 67, which means you still need to tweak parameters or the model is overfitting. 

    2. Yes, you will have 20 if the "mod" port of the validation operator is connected. The reason for this is, the split operator runs the training side two times when the "mod" port of the validation operator is connected to any other operator or result. One time the training side is executed on 70% (In case of 70:30 Split) data (training data) and the other is to train on whole data after validation is complete. In order to avoid this, just remove the connection between "mod" port of the validation operator. If you want to use that, its simple to distinguish, the first 10 performances are related to 70% training data and the 11 to 20 performances are related to whole data.

Answers

  • varunm1
    varunm1 New Altair Community Member
    Hello @Thiru

    Is this what you are looking for? The image is inside the Adaboost operator, we are calculating Training performance and storing it for each iteration using the "Store" operator. The naming convention used for the store operator is "Adaboost_%{execution_count}". The %{execution_count} macro will help in storing performance at each iteration. I am not sure if we can extract AdaBoost weights.



    Do let us know if this helps
  • Thiru
    Thiru New Altair Community Member
    hello @varunm1,

    thanks for your reply.  could you please elaborate on how to use "store" + "macro"s to get the performance
    during each iteration. Im relatively new to rapidminer.  In the process,  Ive tried set/generate macros operator, but
    it doesnot help.  await your reply. thank you
  • varunm1
    varunm1 New Altair Community Member
    Hell@Thiru 

    You don't need to generate a macro. There are predefined macros, in this case I used %{execution_count} macro name in store operator. The reason for this is, the Adaboost iterates 10 times, which means you can get 10 training performances. As you need all the 10 performances, you need to save with a dynamic name that will update after every iteration. So to do this, I used "Adaboost_%{execution_count}" as a name for storing my performance. The %{execution_count} will count the number of times a particular operator executes, as the store operator is located inside AdaBoost, it will iterate 10 times and will name the performance as Adaboost_1, Adaboost_2, Adaboost_3,...

    Please find the attached .rmp file. Import it to RM and check inside Adaboost operator.

  • Thiru
    Thiru New Altair Community Member
    hello @varunm1,  thanks for your reply.  where I can view the performance of all 10 models.  U mean output of validation operator?   we are getting in case adaboost + decision tree as the case used by you.   If i go for adaboost + KNN - i couldnt view all the 10 models.   could you pl look in to this. thanks

    regds
    thiru
  • varunm1
    varunm1 New Altair Community Member
    edited April 2020
    Hello @Thiru

    You can't view them directly, you need to store them first using store operator. That is what I did in the attached process. You need to change the store location as the earlier one is linked to my repository. You need to name the results in store with macro as informed in my earlier post. Once done and run the process, the store operator will store results of adaboost_1, adaboost_2, .... in your repository that you mentioned in store operator 

    Attach store operator as I did. Then point it to a repository location and then give the name as Adaboost_%{execution_count}  then run process and check in that repository location, you will find the results
  • Thiru
    Thiru New Altair Community Member
    hello @varunm1

    thanks for your reply.  I only checked the file sent by you. 
     Ok , I got it.  I retrieved those in store operator through new process and viewed the results. 

    1.  the Adaboost_1 performance shows:  89.04% acc.   adaboost_10 shows: 99.13%.
    But the overall model performance is only:    67.74%. 

     Is it because -the adaboost_1 to adaboost -10 is performed on train data and not test data?  & 67.74% is from test data?

    2.  The file sent by you. shows  the count :   adaboost_1 to Adaboost_20.  whereas  the no. if iterations in adaboost operator
    is mentioned as 10.  How do  we get 20?

    await your reply on the above. thanks

    regds
    thiru


  • varunm1
    varunm1 New Altair Community Member
    edited April 2020 Answer ✓
    Hello @Thiru

    1. Adaboost will try to improve an algorithm by taking misclassified samples in each iteration to build a classifier. So, this works on training side. The outcome of this training is an ensemble of decision trees, that are applied on testing data to check how well the trained algorithm performed. So Adaboost_1 to 10 are training performances, you can see the trained model is improving based on performances. But testing performance is only 67, which means you still need to tweak parameters or the model is overfitting. 

    2. Yes, you will have 20 if the "mod" port of the validation operator is connected. The reason for this is, the split operator runs the training side two times when the "mod" port of the validation operator is connected to any other operator or result. One time the training side is executed on 70% (In case of 70:30 Split) data (training data) and the other is to train on whole data after validation is complete. In order to avoid this, just remove the connection between "mod" port of the validation operator. If you want to use that, its simple to distinguish, the first 10 performances are related to 70% training data and the 11 to 20 performances are related to whole data.
  • Thiru
    Thiru New Altair Community Member
    thanks . it clarifies.

    regds
    thiru
  • Orli
    Orli New Altair Community Member
    edited March 2022
    Hi @varunm1 @Thiru
    I have the same question as Thiru's second question lastly about 'If the number of iterations in AdaBoost operator is mentioned as 10. How do we get 20 models results?'
    Could you please tell me the reason?
    Thanks!

    Regards,
    Orli