"Different results for X-Validation (libSVM) in version 4.6
Dear community,
I am upgrading from rapidminer version 4.6 to 5 and I'm having some difficulties that I hope maybe someone can help me with.
I am using a data set consisting of 40 example set rows with 73 attributes (72 numerical + 1 numerical label). If anyone wants to reproduce the steps, here is the data in Excel format: http://jump.fm/PFMGS.
In rapidminer 4.6 I start the wizard, open x-validation with svm, import my data, and start the process. The result is 100% accuracy. Here are some screenshots: http://img696.imageshack.us/img696/2939/rapidminer4results.png
I tried to reconstruct this in rapidminer 5:
- I imported the data into my repository and created a new process
- Since the imported data was marked nominal by rm, I use Nominal to Numerical converter for the complete dataset
- the output goes into X-Validation module (default parameters as in rm 4.6). from there ave-output goes to results
- in the Validation module it looks like this
-- in training module there is the libSVM module (C-SVC, rbf kernel, gamma=0, C=32, epsilon = 0.0010, same as in rm 4.6)
-- in testing module I use Apply Model and then Performance Module (same default values as in rm 4.6
executing the process results in 90% accuracy. Screenshots: http://img42.imageshack.us/img42/9720/rapidminer5results.png
Did I make a mistake? Thanks for your help.
Alex
I am upgrading from rapidminer version 4.6 to 5 and I'm having some difficulties that I hope maybe someone can help me with.
I am using a data set consisting of 40 example set rows with 73 attributes (72 numerical + 1 numerical label). If anyone wants to reproduce the steps, here is the data in Excel format: http://jump.fm/PFMGS.
In rapidminer 4.6 I start the wizard, open x-validation with svm, import my data, and start the process. The result is 100% accuracy. Here are some screenshots: http://img696.imageshack.us/img696/2939/rapidminer4results.png
I tried to reconstruct this in rapidminer 5:
- I imported the data into my repository and created a new process
- Since the imported data was marked nominal by rm, I use Nominal to Numerical converter for the complete dataset
- the output goes into X-Validation module (default parameters as in rm 4.6). from there ave-output goes to results
- in the Validation module it looks like this
-- in training module there is the libSVM module (C-SVC, rbf kernel, gamma=0, C=32, epsilon = 0.0010, same as in rm 4.6)
-- in testing module I use Apply Model and then Performance Module (same default values as in rm 4.6
executing the process results in 90% accuracy. Screenshots: http://img42.imageshack.us/img42/9720/rapidminer5results.png
Did I make a mistake? Thanks for your help.
Alex
Find more posts tagged with
Sort by:
1 - 17 of
171
Cross validation results will always be slightly different since you are randomly splitting the training set into subsets for training and validation. Unless you can ensure that the cross validation splitting is performed exactly the same between each run you should expect slightly different results. If you notice a huge difference there my be something wrong.
Hi Folks,
If you import the xls and run the following you'll see what the problem is ....
The fact that it still produces 90% satisfies our gullibility.
PS Rather ironically, if you replace the offending operator with a "Guess Types" operator all is well, like this....
If you import the xls and run the following you'll see what the problem is ....
<?xml version="1.0" encoding="UTF-8" standalone="no"?>The operator "Nominal to Numerical" has replaced each attribute column with 0-39
<process version="5.0">
<context>
<input>
<location/>
</input>
<output>
<location/>
<location/>
<location/>
</output>
<macros/>
</context>
<operator activated="true" class="process" expanded="true" name="Root">
<process expanded="true" height="296" width="915">
<operator activated="true" class="retrieve" expanded="true" height="60" name="Retrieve" width="90" x="27" y="74">
<parameter key="repository_entry" value="dataset"/>
</operator>
<operator activated="true" class="nominal_to_numerical" expanded="true" height="94" name="Nominal to Numerical" width="90" x="380" y="165"/>
<operator activated="true" class="x_validation" expanded="true" height="112" name="XValidation" width="90" x="648" y="165">
<parameter key="local_random_seed" value="-1"/>
<process expanded="true" height="296" width="432">
<operator activated="true" class="support_vector_machine_libsvm" expanded="true" height="76" name="LibSVMLearner" width="90" x="171" y="30">
<parameter key="C" value="32.0"/>
<list key="class_weights"/>
</operator>
<connect from_port="training" to_op="LibSVMLearner" to_port="training set"/>
<connect from_op="LibSVMLearner" from_port="model" to_port="model"/>
<portSpacing port="source_training" spacing="0"/>
<portSpacing port="sink_model" spacing="0"/>
<portSpacing port="sink_through 1" spacing="0"/>
</process>
<process expanded="true" height="296" width="432">
<operator activated="true" class="apply_model" expanded="true" height="76" name="ModelApplier" width="90" x="45" y="30">
<list key="application_parameters"/>
</operator>
<operator activated="true" class="performance" expanded="true" height="76" name="Performance" width="90" x="238" y="30"/>
<connect from_port="model" to_op="ModelApplier" to_port="model"/>
<connect from_port="test set" to_op="ModelApplier" to_port="unlabelled data"/>
<connect from_op="ModelApplier" from_port="labelled data" to_op="Performance" to_port="labelled data"/>
<connect from_op="Performance" from_port="performance" to_port="averagable 1"/>
<portSpacing port="source_model" spacing="0"/>
<portSpacing port="source_test set" spacing="0"/>
<portSpacing port="source_through 1" spacing="0"/>
<portSpacing port="sink_averagable 1" spacing="0"/>
<portSpacing port="sink_averagable 2" spacing="0"/>
</process>
</operator>
<connect from_op="Retrieve" from_port="output" to_op="Nominal to Numerical" to_port="example set input"/>
<connect from_op="Nominal to Numerical" from_port="example set output" to_op="XValidation" to_port="training"/>
<connect from_op="XValidation" from_port="model" to_port="result 1"/>
<connect from_op="XValidation" from_port="training" to_port="result 2"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="sink_result 1" spacing="0"/>
<portSpacing port="sink_result 2" spacing="0"/>
<portSpacing port="sink_result 3" spacing="0"/>
</process>
</operator>
</process>

PS Rather ironically, if you replace the offending operator with a "Guess Types" operator all is well, like this....
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<process version="5.0">
<context>
<input>
<location/>
</input>
<output>
<location/>
<location/>
<location/>
<location/>
</output>
<macros/>
</context>
<operator activated="true" class="process" expanded="true" name="Root">
<process expanded="true" height="296" width="915">
<operator activated="true" class="retrieve" expanded="true" height="60" name="Retrieve" width="90" x="27" y="74">
<parameter key="repository_entry" value="dataset"/>
</operator>
<operator activated="true" breakpoints="after" class="guess_types" expanded="true" height="76" name="Guess Types" width="90" x="447" y="165"/>
<operator activated="true" class="x_validation" expanded="true" height="112" name="XValidation" width="90" x="648" y="165">
<parameter key="local_random_seed" value="-1"/>
<process expanded="true" height="296" width="432">
<operator activated="true" class="support_vector_machine_libsvm" expanded="true" height="76" name="LibSVMLearner" width="90" x="171" y="30">
<parameter key="C" value="32.0"/>
<list key="class_weights"/>
</operator>
<connect from_port="training" to_op="LibSVMLearner" to_port="training set"/>
<connect from_op="LibSVMLearner" from_port="model" to_port="model"/>
<portSpacing port="source_training" spacing="0"/>
<portSpacing port="sink_model" spacing="0"/>
<portSpacing port="sink_through 1" spacing="0"/>
</process>
<process expanded="true" height="296" width="432">
<operator activated="true" class="apply_model" expanded="true" height="76" name="ModelApplier" width="90" x="45" y="30">
<list key="application_parameters"/>
</operator>
<operator activated="true" class="performance" expanded="true" height="76" name="Performance" width="90" x="238" y="30"/>
<connect from_port="model" to_op="ModelApplier" to_port="model"/>
<connect from_port="test set" to_op="ModelApplier" to_port="unlabelled data"/>
<connect from_op="ModelApplier" from_port="labelled data" to_op="Performance" to_port="labelled data"/>
<connect from_op="Performance" from_port="performance" to_port="averagable 1"/>
<portSpacing port="source_model" spacing="0"/>
<portSpacing port="source_test set" spacing="0"/>
<portSpacing port="source_through 1" spacing="0"/>
<portSpacing port="sink_averagable 1" spacing="0"/>
<portSpacing port="sink_averagable 2" spacing="0"/>
</process>
</operator>
<connect from_op="Retrieve" from_port="output" to_op="Guess Types" to_port="example set input"/>
<connect from_op="Guess Types" from_port="example set output" to_op="XValidation" to_port="training"/>
<connect from_op="XValidation" from_port="model" to_port="result 1"/>
<connect from_op="XValidation" from_port="training" to_port="result 2"/>
<connect from_op="XValidation" from_port="averagable 1" to_port="result 3"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="sink_result 1" spacing="0"/>
<portSpacing port="sink_result 2" spacing="0"/>
<portSpacing port="sink_result 3" spacing="0"/>
<portSpacing port="sink_result 4" spacing="0"/>
</process>
</operator>
</process>
Hi,
what exactly is the problem with the nominal to numerical operator? It's behavior is exactly as it was in 4.x if you don't change the default parameter settings. Please remember, that you had to include the nominal to numerical operator in 4.x in an AttributeSubetPreprocessing operator to restrict the attributes it was working on. You might now either use the equivalent Select Subset operator or simply use the built in filter.
Greetings,
Sebastian
what exactly is the problem with the nominal to numerical operator? It's behavior is exactly as it was in 4.x if you don't change the default parameter settings. Please remember, that you had to include the nominal to numerical operator in 4.x in an AttributeSubetPreprocessing operator to restrict the attributes it was working on. You might now either use the equivalent Select Subset operator or simply use the built in filter.
Greetings,
Sebastian
Sebastian,
thank you for your answer. I imported values from a csv file that looked like this.
My problem still remains. I cannot import the data as numerical, but at least I could figure out why. My data is in scientific notation (Matlab standard). A value with the exp != 000 is correctly imported as numerical (real), whereas a value with the exponent == 000 is imported as nominal.
so
and
I would really appreciate if anyone has a solution for me. Again, RM4 correctly imports those values as numerical
Thanks!
thank you for your answer. I imported values from a csv file that looked like this.
2.3647619e+000,9.5738476e-001,9.6855298e-001,...Unfortunately the real values were recognized as nominal so I wanted to use the nominal to numerical operator to mark them as numerical. But that operator simply converted the values to numerical 1, 2, 3 and so on. So I guess I just misunderstood the intention of the operator. I needed a 'real' converter.
My problem still remains. I cannot import the data as numerical, but at least I could figure out why. My data is in scientific notation (Matlab standard). A value with the exp != 000 is correctly imported as numerical (real), whereas a value with the exponent == 000 is imported as nominal.
so
2.6855298e-001is correctly imported as numerical
and
2.3647619e+000is incorrectly imported as nominal.
I would really appreciate if anyone has a solution for me. Again, RM4 correctly imports those values as numerical

Thanks!
Sebastian,
thanks for your help. Unfortunately that did not solve the problem. The Parse Numbers operator still labels numbers like 2.3647619e+000 as nominal, but I want them to be numerical/real.
See screenshot: http://img684.imageshack.us/img684/7505/nominalnumericalproblem.png
Any idea how I can achieve that?
thanks for your help. Unfortunately that did not solve the problem. The Parse Numbers operator still labels numbers like 2.3647619e+000 as nominal, but I want them to be numerical/real.
See screenshot: http://img684.imageshack.us/img684/7505/nominalnumericalproblem.png
Any idea how I can achieve that?
Hi Folks,
http://rapid-i.com/rapidforum/index.php/topic,1791.msg7012.html#msg7012
Using the solution so darkly hidden therein on this csv data..
2.6855298e-001,2.3647619e+000
2.3647619e+000,2.6855298e-001
I find that the numbers are read as reals by the following code...
http://rapid-i.com/rapidforum/index.php/topic,1791.msg7012.html#msg7012
Using the solution so darkly hidden therein on this csv data..
2.6855298e-001,2.3647619e+000
2.3647619e+000,2.6855298e-001
I find that the numbers are read as reals by the following code...
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<process version="5.0">
<context>
<input>
<location/>
</input>
<output>
<location/>
<location/>
</output>
<macros/>
</context>
<operator activated="true" class="process" expanded="true" name="Process">
<process expanded="true" height="-20" width="-50">
<operator activated="true" class="read_csv" expanded="true" height="60" name="Read CSV" width="90" x="28" y="43">
<parameter key="file_name" value="C:\Documents and Settings\Alien\My Documents\rm_workspace\R5 Forum\scients.csv"/>
</operator>
<operator activated="true" class="guess_types" expanded="true" height="76" name="Guess Types" width="90" x="169" y="43"/>
<connect from_op="Read CSV" from_port="output" to_op="Guess Types" to_port="example set input"/>
<connect from_op="Guess Types" from_port="example set output" to_port="result 1"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="sink_result 1" spacing="0"/>
<portSpacing port="sink_result 2" spacing="0"/>
</process>
</operator>
</process>
Haddock,
thank you for your help. Your solution works partially... I'm getting weird behavior here:
In your example, the values are labeled as real in the results workspace (screenshot: http://img140.imageshack.us/img140/6470/88436391.png)
but I need to work with the values in the process. THERE the same values in that example are labeled nominal (sreenshot: http://img179.imageshack.us/img179/6517/18861165.png)
So in the process I cannot use the values as input for libSVM etc. I really don't understand this, maybe someone can explain/post a solution?
thank you for your help. Your solution works partially... I'm getting weird behavior here:
In your example, the values are labeled as real in the results workspace (screenshot: http://img140.imageshack.us/img140/6470/88436391.png)
but I need to work with the values in the process. THERE the same values in that example are labeled nominal (sreenshot: http://img179.imageshack.us/img179/6517/18861165.png)
So in the process I cannot use the values as input for libSVM etc. I really don't understand this, maybe someone can explain/post a solution?
Hi Alexx,
the reason is quite simple: everything is fine and this is just the way "Guess Types" behaves. It guesses the types but from the real data (which is not available in the meta data transformation) and not from the meta data. That means that the meta data cannot be correctly updated during process design. I would recommend to perform Haddocks process and store the data in the RM repository. There, you will easily see that the type is correct. Just use the data from the respository then and feed it into the learner and everything will be fine.
Alternatively, you could simply feed the data into the LibSVM after the transformation process. It wíll complain but you disable those complains in the preferences: simply activate "general.capabilities.warn". However, the best way is to use the repository here.
Cheers,
Ingo
the reason is quite simple: everything is fine and this is just the way "Guess Types" behaves. It guesses the types but from the real data (which is not available in the meta data transformation) and not from the meta data. That means that the meta data cannot be correctly updated during process design. I would recommend to perform Haddocks process and store the data in the RM repository. There, you will easily see that the type is correct. Just use the data from the respository then and feed it into the learner and everything will be fine.
Alternatively, you could simply feed the data into the LibSVM after the transformation process. It wíll complain but you disable those complains in the preferences: simply activate "general.capabilities.warn". However, the best way is to use the repository here.
Cheers,
Ingo
did you ever try to set gamma != 0? As i understand correctly gamma=0 means, that it will be effectively set to 1 / num_attributes. I would recommend to set it fixed in both versions for comparable results (1/72). Also I recognized a difference in the random_seed parameter of the X-Validation operator which could affect the process.
I'm curious if this changes anything!
Just my two cents
Greetings, Harald