nominal to binominal - strange behaviour
schills
New Altair Community Member
Hi guys
So i have tried to use the "nominal to binominal" operator to convert a single attribute from nominal to binominal. Should be simple, right? Well i am having troubles with this. Now the issue is that the single attribute the operator is meant to convert is not being converted to binomial, it is actually being removed!! Obviously this is very strange behaviour! I look at the "meta data" view in the main process window and it shows one less attribute (the single attribute i am trying to convert) and so it appears to have been deleted and not converted to binomial!
However, after i run the process and view the results, the single attribute has indeed been converted to binomial and has not been deleted! So it looks like there is an issue in the main process view only.
Now all this behaviour only occurs when i use data directly from my Database (rapid analytics), but it does not occur when i use the same data from any of the repositories (remote or local).
However, this issue does not occur for other "type conversion" operators, such as "nominal to numerical". So i can use data from the Database with many other operators and it all works fine. It is just the operaotor "nominal to binominal"
Why does the operator "nominal to binominal" not work in this situation, but all the others do?
I have been doing my own research and it seems that the issue may be the setting "rapid miner general nominal values limit" has been reached. So i try to increase this limit to a very large number, but still no luck. Although it says that a cache refresh of the metadata of the process is required....how do I do this?
I am working in rapid miner (locally) to do all this. So maybe i should try update this setting in RapidAnalytics or the Server itself?
This is a large dataset (73 attributes, 451 examples), but there are only 8 nominal attributes.
Here is the XML output for 2 process, one having data directly from the database, the other having the same data from the remote repository.
1. Database
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<process version="5.1.009">
<context>
<input/>
<output/>
<macros/>
</context>
<operator activated="true" class="process" compatibility="5.1.009" expanded="true" name="Process">
<process expanded="true" height="269" width="346">
<operator activated="true" class="retrieve" compatibility="5.1.009" expanded="true" height="60" name="Retrieve" width="90" x="26" y="138">
<parameter key="repository_entry" value="//DB/BetoRA/Example Sets/dbo.NBA_Atlanta"/>
</operator>
<operator activated="true" class="nominal_to_binominal" compatibility="5.1.009" expanded="true" height="94" name="Nominal to Binominal" width="90" x="246" y="120">
<parameter key="attribute_filter_type" value="single"/>
<parameter key="attribute" value="Result"/>
<parameter key="include_special_attributes" value="true"/>
</operator>
<connect from_op="Retrieve" from_port="output" to_op="Nominal to Binominal" to_port="example set input"/>
<connect from_op="Nominal to Binominal" from_port="example set output" to_port="result 1"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="sink_result 1" spacing="0"/>
<portSpacing port="sink_result 2" spacing="0"/>
</process>
</operator>
</process>
2. From the local repository
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<process version="5.1.009">
<context>
<input/>
<output/>
<macros/>
</context>
<operator activated="true" class="process" compatibility="5.1.009" expanded="true" name="Process">
<process expanded="true" height="269" width="346">
<operator activated="true" class="retrieve" compatibility="5.1.009" expanded="true" height="60" name="Retrieve" width="90" x="31" y="178">
<parameter key="repository_entry" value="//NewLocalRepository/Repository/TEST/dbo.NBA_Atlanta"/>
</operator>
<operator activated="true" class="nominal_to_binominal" compatibility="5.1.009" expanded="true" height="94" name="Nominal to Binominal" width="90" x="246" y="120">
<parameter key="attribute_filter_type" value="single"/>
<parameter key="attribute" value="Result"/>
<parameter key="include_special_attributes" value="true"/>
</operator>
<connect from_op="Retrieve" from_port="output" to_op="Nominal to Binominal" to_port="example set input"/>
<connect from_op="Nominal to Binominal" from_port="example set output" to_port="result 1"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="sink_result 1" spacing="0"/>
<portSpacing port="sink_result 2" spacing="0"/>
</process>
</operator>
</process>
I would be very grateful for any feedback on this issue
Cheers
Andrew
So i have tried to use the "nominal to binominal" operator to convert a single attribute from nominal to binominal. Should be simple, right? Well i am having troubles with this. Now the issue is that the single attribute the operator is meant to convert is not being converted to binomial, it is actually being removed!! Obviously this is very strange behaviour! I look at the "meta data" view in the main process window and it shows one less attribute (the single attribute i am trying to convert) and so it appears to have been deleted and not converted to binomial!
However, after i run the process and view the results, the single attribute has indeed been converted to binomial and has not been deleted! So it looks like there is an issue in the main process view only.
Now all this behaviour only occurs when i use data directly from my Database (rapid analytics), but it does not occur when i use the same data from any of the repositories (remote or local).
However, this issue does not occur for other "type conversion" operators, such as "nominal to numerical". So i can use data from the Database with many other operators and it all works fine. It is just the operaotor "nominal to binominal"
Why does the operator "nominal to binominal" not work in this situation, but all the others do?
I have been doing my own research and it seems that the issue may be the setting "rapid miner general nominal values limit" has been reached. So i try to increase this limit to a very large number, but still no luck. Although it says that a cache refresh of the metadata of the process is required....how do I do this?
I am working in rapid miner (locally) to do all this. So maybe i should try update this setting in RapidAnalytics or the Server itself?
This is a large dataset (73 attributes, 451 examples), but there are only 8 nominal attributes.
Here is the XML output for 2 process, one having data directly from the database, the other having the same data from the remote repository.
1. Database
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<process version="5.1.009">
<context>
<input/>
<output/>
<macros/>
</context>
<operator activated="true" class="process" compatibility="5.1.009" expanded="true" name="Process">
<process expanded="true" height="269" width="346">
<operator activated="true" class="retrieve" compatibility="5.1.009" expanded="true" height="60" name="Retrieve" width="90" x="26" y="138">
<parameter key="repository_entry" value="//DB/BetoRA/Example Sets/dbo.NBA_Atlanta"/>
</operator>
<operator activated="true" class="nominal_to_binominal" compatibility="5.1.009" expanded="true" height="94" name="Nominal to Binominal" width="90" x="246" y="120">
<parameter key="attribute_filter_type" value="single"/>
<parameter key="attribute" value="Result"/>
<parameter key="include_special_attributes" value="true"/>
</operator>
<connect from_op="Retrieve" from_port="output" to_op="Nominal to Binominal" to_port="example set input"/>
<connect from_op="Nominal to Binominal" from_port="example set output" to_port="result 1"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="sink_result 1" spacing="0"/>
<portSpacing port="sink_result 2" spacing="0"/>
</process>
</operator>
</process>
2. From the local repository
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<process version="5.1.009">
<context>
<input/>
<output/>
<macros/>
</context>
<operator activated="true" class="process" compatibility="5.1.009" expanded="true" name="Process">
<process expanded="true" height="269" width="346">
<operator activated="true" class="retrieve" compatibility="5.1.009" expanded="true" height="60" name="Retrieve" width="90" x="31" y="178">
<parameter key="repository_entry" value="//NewLocalRepository/Repository/TEST/dbo.NBA_Atlanta"/>
</operator>
<operator activated="true" class="nominal_to_binominal" compatibility="5.1.009" expanded="true" height="94" name="Nominal to Binominal" width="90" x="246" y="120">
<parameter key="attribute_filter_type" value="single"/>
<parameter key="attribute" value="Result"/>
<parameter key="include_special_attributes" value="true"/>
</operator>
<connect from_op="Retrieve" from_port="output" to_op="Nominal to Binominal" to_port="example set input"/>
<connect from_op="Nominal to Binominal" from_port="example set output" to_port="result 1"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="sink_result 1" spacing="0"/>
<portSpacing port="sink_result 2" spacing="0"/>
</process>
</operator>
</process>
I would be very grateful for any feedback on this issue
Cheers
Andrew
Tagged:
0
Answers
-
Hi,
the metadata view in the tooltip is a view only - it's a visual help to ease process design. If something in there is broken it does not break the process execution, as you saw in your case. In some cases correct prediction of the outcome is outright impossible before actually executing the process (especially when working with databases), so the metadata sadly is somewhat limited.
However feel free to create a bugreport for your case here.
Regards,
Marco0 -
Hi,
Thanks for your reply!
Yes what you say is true for that simple case below, however the reason i am changing nominal to binominal is so i can run the data through a SVM (my nominal label attribute needs to be changed to binomial). And this process will not run, because the SVM keeps asking me to add a label attribute. So the process will not work because the nominal to binominal operator seems to delete the attributes, and thus the process assumes there is no attribute!
Its strange because other operators which change data type such as nominal to numerical all work fine, its just the nominal to binominal which doesnt work!
Thoughts?0 -
Hi,
that is because RapidMiner does not know what values it will get when executing the "Read Database" operator before it actually executes it while running the process. To convert nominal to binomal, RapidMiner creates for each nominal value of a nominal attribute a new attribute named "atttributeName_attributeValue" and sets it to true/false. Because RapidMiner does not yet know what nominal values there will be, the metadata cannot be created properly, the only certain thing is that the old attribute will be gone. That's the reason for the behaviour you're experiencing.
You have at least two options here: You can store the Read Database result via the "Store" operator, and later read it via "Retrieve" - then the metadata will work correctly.
Or you can execute the process anyway, if there is a label attribute during runtime (metadata does not matter!), the SVM will work. If you need to set a label, use the "Set Role" operator before the SVM.
Regards,
Marco0