how to feed in the data set to arleady trained and tested model

Thiru · February 2020

Ive arrived a supervised model after training and performance with test data.
Now, I want to check the model with fresh different field data set.

1. in that case does the new data set needs to have a separate label column as the model is already trained and tested? I assume the labelled column is required in supervised model only to train and test.

2. I used store operator to store the trained and tested model. If I feed the new data set, it shows the result " NO=100% and YES = 0. ( its a binomial classification problem), which is wrong. can I have some help how to resolve this and the correct way of giving fresh data set to the already verified model.

thanks in advance.
regds

thiru

varunm1 · February 2020

For your question 1, no you don't need a label column to make predictions on new data.

For question 2, it all depends on how accurate your trained model is. You should also keep in mind that the new data you are trying to predict should be closer to the distribution of trained data.

If I feed the new data set, it shows the result " NO=100% and YES = 0. ( its a binomial classification problem), which is wrong.

How can you say its wrong? do you have labels for this new data? If not, then we cannot say its wrong if your trained model is highly accurate and the new data has a similar distribution.

Thiru · February 2020

Thanks. Will revert .

Thiru · February 2020

hello varunm1, thanks for the reply. Ive checked. Yes the data distribution is different.

Im using two types of data sets -
one is type A : labelled data to train/test and
other is type B : unlabelled data.

1. type A data set - all the descriptive features are already available in normalised ( 0 to 1) range corresponding to its full scale value

2. type B data set - is not available in normalized scale. Moreover the full scale values of attributes are different than the values of respective attributes of type A.

I tried using z-transformed values for both type A & B. Not sure whether it is correct. thanks in advance

regds
thirumurthy m

rfuentealba · February 2020

Hello @Thiru,

I just posted a trick that can help you, but will post it for you too:

RapidMiner doesn't have a way to know that both datasets contain the same structure, therefore it doesn't know what kinds of preparations does it need. But today I have a trick for you, right under the sleeve. It's very basic but might help you.

For example, for training, I have this simple process:

Image: https://us.v-cdn.net/6030995/uploads/editor/lp/gbpu2koyzyfr.png

Instead of retrieving data and making a decision tree inside the process, I make some small modifications:

Image: https://us.v-cdn.net/6030995/uploads/editor/re/qedndn41wqlu.png

...and I then create a "main" process from where I call the rest:

Do you see that "Execute Prepare Data" operator being called twice? It is the result of dragging and dropping the process you want to execute. You can actually save a lot of time if you embed your code like this, as you can reuse your data.

If you have two sets that have a different but potentially convertible schema, you can make as many "subprocesses" (notice the quotes, because there is something called "subprocess" too, but it's a little different) as you need to transform your data.

Hope this helps,

Rodrigo.

how to feed in the data set to arleady trained and tested model

Answers

Categories