Altair RISE

A program to recognize and reward our most engaged community members

Nominate Yourself Now!

Missing Attribute when applying model although Training exampleSet and real DataSet are the same

Hi,

I am new to rapidMiner and I am trying to classifly YouTube Comments on Innovation Products into Customer Requirement or not.

Both ExampleSets should be the same as I used the wordlist from the training data and applied it to the data I want to classify with the Process Documents from Data Operator. In the following picuture you can see a comparison of both DataSets.

I used RapidMiner Automodel to create an SVM Classification Process and then I stored the Model with this Process. I then used the following Process:

<?xml version="1.0" encoding="UTF-8"?><process version="9.3.001">

<macro>

<value>Lets try the relly simple way. I like smart watches</value>

</macro>

</macros>

</context>

</operator>

</operator>

</operator>

</operator>

</process>

</operator>

</process>

However I always get this error:

I used this tutorial on youTube with the videoid VbNhvYQZ2v0 and the rapidMiner Academy TextMining and Machine Learning course to construct my Processes.

This Process shows my Preprocessing for my Training Data:

<?xml version="1.0" encoding="UTF-8"?><process version="9.3.001">

</context>

</operator>

</operator>

</operator>

</operator>

</operator>

</operator>

</process>

</operator>

</operator>

</operator>

</list>

</operator>

</list>

</operator>

</operator>

</process>

</operator>

</process>

And this Process shows my Preprocessing for my Data I want to classify:

<?xml version="1.0" encoding="UTF-8"?><process version="9.3.001">

</context>

</operator>

</operator>

</operator>

</operator>

</operator>

</operator>

</process>

</operator>

</operator>

</list>

</operator>

</list>

</operator>

</operator>

</process>

</operator>

</process>

I also attached a 100 rows of my sample Data. I apologize if this problem has already been solved (Couldn't find anything useful for my situation) or if I made some simple mistake.

If someone knows how to correct english spelling, (I already tried the python script using textblob posted in the rapid Miner community. It changes words that are already correct for example "Big" to "Fig") I would also be really grateful.

Thanks in advance,

Tennessee

Find more posts tagged with

AI Studio

Apply Model

Accepted answers

Tennessee

Okay so I copied the svm model operator from the automodell process that was created into a cross validation and created a another model. In this model I can feed the data that I need to classify, without getting the error message. Now it works smoothly. But still 3 days wasted.

Hence I recommend if you have problems with the model created by the automodler, copy the model into a cross validation.

All comments

sgenzer

hi @Tennessee ok I was able to look at this. Your Process Documents preprocessing is NOT the same in your training and testing, which of course it needs to be.

Image: https://us.v-cdn.net/6030995/uploads/editor/5b/54xccxk5gqkk.png

Scott

Telcontar120

Also be very careful with wordlists. You really need to store the wordlist from your original model construction process and then make sure you use that same wordlist when applying the model in the future, otherwise differences in the text you are processing can lead to incompatible results.

Tennessee

The reason the preprocessing steps are not the same is due the wordlist I saved while creating the training data to create the model. I used this wordlist as input for the operator 'process documents from data' which allows me to leave out certain preprocessing steps as per rapid miner text mining tutorial on YouTube. Also if I hadn't used the wordlist and the same preprocessing steps for both training and real data I would have gotten different attributes in my prepped tables. Are you sure this is the problem?

Tennessee

If you compare both my last two processes you can see that I store a wordlist in the first process and use it in the second one. Also the first picture shows that I have the same amount of attributes in both examples. This would not be possible if I used the generate n gram operator in both preprocessing steps without a wordlist wouldn't it?

Thanks in advance,

Tennessee

Tennessee

I have also tried feeding the model with training data and I still get the same error message.

I used rapidminer automodell to create this model and it can't even except its own training data. Something seems very wrong. I'll try manually creating the model.

Tennessee