performance of testing data

rafeena
rafeena New Altair Community Member
edited November 2024 in Community Q&A
hi,

i have included images on how i have done my classification. i would like to know how to view the performance of my testing data.hopefully what i am doing here is correct

thanks 
Tagged:

Best Answer

Answers

  • lionelderkrikor
    lionelderkrikor New Altair Community Member
    Hi @rafeena,

    Your process is correct. Your performance vector is given by the ave output port  of the Validation operator.
    Do you encounter any error with this process ?

    Regards,

    Lionel
  • rafeena
    rafeena New Altair Community Member
    hi lionelderkrikor .. it didnt give me any problem. however i would like to see the performance of my testing file, the file names retrieve testing date and i believe the performance i got now is for my training data. 
  • IngoRM
    IngoRM New Altair Community Member
    Hi,
    Just add another Performance operator after the Apply Model (2) which will then calculate the error rates for the provided test data.
    Side note: what you have now is actually is not really the training error but the estimation of the test error from a cross-validation.  The true training error would be if you would apply the model on the complete training data again and calculate the performance for that.
    The cross-validated error and the test error should be similar (provided you have enough data and it follows the same distributions).
    Hope this helps,
    Ingo
  • rafeena
    rafeena New Altair Community Member
    IngoRM  hi. i did it like you said but the result is not good. the accuracy is 0
  • IngoRM
    IngoRM New Altair Community Member
    Well, I see that you have changed your process a bit.  You seem to select some column in the training path - are you sure that you do the same data transformations also on the test data?
  • rafeena
    rafeena New Altair Community Member
    @IngoRM i am doing 2 process actually one is to select features using tfidf and one using entropy. can you explain more on the data transformation because i probably didnt execute them all
  • lionelderkrikor
    lionelderkrikor New Altair Community Member
    Hi @rafeena,

    What Ingo said means that you have to apply strictly the same preprocessing steps to both your training dataset and test dataset.
    From your screenshot of your previous post, it seems that your are selecting only some features (via the Weight by Information Gain / Select by Weights operators) during your training step.
    You have to apply strictly the same selection to your test data.
    To have a personalized response, please share your process(es) and all your dataset(s).

    Regards,

    Lionel


  • rafeena
    rafeena New Altair Community Member
    hi @lionelderkrikor i have applied the same step but it says that the attributes are not a matched, however i do believe the attributes i used are all the same. any way i have included my datasets. my processes are as the pictures above 
  • lionelderkrikor
    lionelderkrikor New Altair Community Member
    Hi @rafeena,

    In attached file, the working process.
    I'm able to obtain a test performance (accuracy) of  around 70 % (calculated by the Cross Validation operator).

    Hope this helps,

    Regards,

    Lionel

    PS : You can not calculate the "test error" from your dataset "testing data2.1" because you have not the true label...
  • rafeena
    rafeena New Altair Community Member
    thank you very much @lionelderkrikor. when you say test error does this mean i cannot see the accuracy for testing data 2.1?

  • lionelderkrikor
    lionelderkrikor New Altair Community Member
    Answer ✓
    No, You can't ! 

    Regards,

    Lionel
  • rafeena
    rafeena New Altair Community Member
    @lionelderkrikor noted thanks for your help.
  • rafeena
    rafeena New Altair Community Member
    edited January 2020
    @lionelderkrikor . i would like to be clear on training and testing data for rapidminer. if i do it like the process in the photo the file named testind data 2.1 is not actually set as my testing data right? both my  training and testing data is within the file formspring training 2 and rapidminer will choose randomly which one will be testing and training data. is this correct?