how to feed in the data set to arleady trained and tested model

Thiru
Thiru New Altair Community Member
edited November 5 in Community Q&A
Ive arrived a supervised model after training and performance with test data.
Now, I want  to check the model with fresh different field data set. 

1. in that case does the new data set needs to have  a separate label column as the model is already trained and tested?  I assume the labelled column is required in supervised model only to train and test. 

2.  I used store operator to store the trained and tested model.  If I feed the new data set, it shows the result " NO=100%   and YES = 0.  ( its a binomial classification problem), which is wrong.   can I have some help how to resolve this and the correct way of giving fresh data set to the already verified model.

thanks in advance.
regds

thiru



Tagged:

Answers

  • varunm1
    varunm1 New Altair Community Member
    For your question 1, no you don't need a label column to make predictions on new data.

    For question 2, it all depends on how accurate your trained model is. You should also keep in mind that the new data you are trying to predict should be closer to the distribution of trained data.

      If I feed the new data set, it shows the result " NO=100%   and YES = 0.  ( its a binomial classification problem), which is wrong.
    How can you say its wrong? do you have labels for this new data? If not, then we cannot say its wrong if your trained model is highly accurate and the new data has a similar distribution.
  • Thiru
    Thiru New Altair Community Member
    Thanks. Will revert .
  • Thiru
    Thiru New Altair Community Member
    hello varunm1, thanks for the reply.  Ive checked.  Yes the data distribution is different.

    Im using two types of data sets -  
    one is type A : labelled data to train/test    and 
    other is type B : unlabelled data. 

    1. type A data set - all the descriptive features are  already available in normalised ( 0 to 1) range corresponding to its full scale value  

    2. type B data set - is not available in normalized scale.   Moreover the full scale values of attributes are different than the values of respective attributes of type A. 

    I tried using z-transformed values for both type A & B.  Not sure whether it is correct.  thanks in advance

    regds
    thirumurthy m

  • rfuentealba
    rfuentealba New Altair Community Member
    Hello @Thiru,

    I just posted a trick that can help you, but will post it for you too:

    RapidMiner doesn't have a way to know that both datasets contain the same structure, therefore it doesn't know what kinds of preparations does it need. But today I have a trick for you, right under the sleeve. It's very basic but might help you.

    For example, for training, I have this simple process:





    Instead of retrieving data and making a decision tree inside the process, I make some small modifications:


    ...and I then create a "main" process from where I call the rest:



    Do you see that "Execute Prepare Data" operator being called twice? It is the result of dragging and dropping the process you want to execute. You can actually save a lot of time if you embed your code like this, as you can reuse your data.

    If you have two sets that have a different but potentially convertible schema, you can make as many "subprocesses" (notice the quotes, because there is something called "subprocess" too, but it's a little different) as you need to transform your data.

    Hope this helps,

    Rodrigo.