Rapidminer changes my values...

Jorge
Jorge New Altair Community Member
edited November 5 in Community Q&A
Hi,

I'm working with Rapidminer 4.3 in that project

<operator name="Root" class="Process" expanded="yes">
    <operator name="ArffExampleSource" class="ArffExampleSource" breakpoints="after">
        <parameter key="data_file" value="C:\Input.arff"/>
        <parameter key="id_attribute" value="1"/>
        <parameter key="label_attribute" value="example6"/>
    </operator>
    <operator name="InteractiveAttributeWeighting" class="InteractiveAttributeWeighting">
    </operator>
    <operator name="Learn" class="OperatorChain" expanded="yes">
        <operator name="W-NaiveBayesUpdateable" class="W-NaiveBayesUpdateable">
        </operator>
        <operator name="ModelWriter" class="ModelWriter">
            <parameter key="model_file" value="C:\model.mod"/>
            <parameter key="output_type" value="XML"/>
        </operator>
    </operator>
    <operator name="ArffExampleSource (2)" class="ArffExampleSource" breakpoints="after">
        <parameter key="data_file" value="C:\Prediction.arff"/>
        <parameter key="id_attribute" value="1"/>
        <parameter key="label_attribute" value="example6"/>
    </operator>
    <operator name="ModelLoader" class="ModelLoader">
        <parameter key="model_file" value="C:\model.mod"/>
    </operator>
    <operator name="ModelApplier" class="ModelApplier">
    </operator>
</operator>
with input.arff...

@RELATION Input

@ATTRIBUTE Id numeric
@ATTRIBUTE example1 string
@ATTRIBUTE example2 string
@ATTRIBUTE example3 string
@ATTRIBUTE example4 string
@ATTRIBUTE example5 string
@ATTRIBUTE example6 string

@DATA
'1','ex1','hello4','hw1','false','1000k','slow'
'2','ex1','hello6','hw2','true','4000k','slow'
'3','ex1','hello2','hw3','false','500k','slow'
'4','ex1','hello3','hw3','true','2000k','slow'
'5','ex2','hello2','hw2','true','500k','slow'
'6','ex2','hello5','hw1','true','1000k','mid'
'7','ex2','hello2','hw3','false','4000k','fast'
'8','ex3','hello','hw1','true','2000k','mid'
'9','ex3','hello','hw2','true','4000k','fast'
'10','ex3','hello','hw3','false','2000k','slow'
'11','ex3','hello','hw1','false','500k','mid'
and prediction.arff.....

@RELATION Prediction

@ATTRIBUTE Id numeric
@ATTRIBUTE example1 string
@ATTRIBUTE example2 string
@ATTRIBUTE example3 string
@ATTRIBUTE example4 string
@ATTRIBUTE example5 string


@DATA
'100','ex1','hello','hw1','false','1000k'
'101','ex1','hello2','hw2','true','4000k'
'102','ex1','hello','hw2','true','4000k'
'103','ex1','hello2','hw3','true','500k'
'104','ex1','hello','hw2','true','2000k'
'105','ex1','hello2','hw1','true','4000k'
'106','ex2','hello3','hw1','false','500k'
'107','ex3','hello3','hw2','true','4000k'
'108','ex3','hello4','hw3','true','500k'
'109','ex3','hello5','hw3','false','500k'
'110','ex3','hello6','hw2','true','500k'
'111','ex3','hello2','hw1','false','500k'
'112','ex3','hello6','hw1','true','500k'
when I execute the program, at the results, I click on "Data View" of the "Data Table" and the values of the colum "example1" are differents of the prediction.arff example1 attribute.

Anyone can help me?
Is only a print error, or affects too in the learning operator?

Thanks in advance.

Cheers,
Jorge

Tagged:

Answers

  • steffen
    steffen New Altair Community Member
    Hello Jorge

    I got this warning message:

    [Warning] W-NaiveBayesUpdateable: The internal nominal mappings are not the same between training and application for attribute 'example2'. This will probably lead to wrong results during model application.
    RM stores a mapping for nominal values which somehow affects the models. I suggest as workaround:
    -> Load both files, add an attribute marking it as train /prediction (AttributeConstruction and ChangeAttributeRole)
    -> Merge (ExampleSetMerge)
    -> save as exampleset

    now you can perform your posted process either by loading the set twice and apply ExampleFilter or by using a combination of ExampleFilter and IOMultiplier

    hope this was helpful

    regards,

    Steffen
  • Jorge
    Jorge New Altair Community Member
    Thanks a lot steffen

    It works perfectly now  :)
  • pathros
    pathros New Altair Community Member
    Steffen. I got the same problem but in rapidminer 5.0. I apply a model gotten from the "optimize selection evolutionary" process and i get the same
    warnings:
    " WARNING: SimpleDistribution: The internal nominal mappings are not the same between training and application for attribute 'carrera'. This will probably lead to wrong results during model application."
    and the results in the prediction are not the same as those resulted in the split validation which tells me that this warning does lead to wrong results.

    but i don't find the same operators where you say:
    RM stores a mapping for nominal values which somehow affects the models. I suggest as workaround:
    -> Load both files, add an attribute marking it as train /prediction (AttributeConstruction and ChangeAttributeRole)
    -> Merge (ExampleSetMerge)
    -> save as exampleset

    how can i do the latter in rapidminer 5.0?


    i do it without the optimizer: my XML looks like this

    <?xml version="1.0" encoding="UTF-8" standalone="no"?>
    <process version="5.0">
      <context>
        <input/>
        <output/>
        <macros/>
      </context>
      <operator activated="true" class="process" compatibility="5.0.10" expanded="true" name="Process">
        <process expanded="true" height="528" width="619">
          <operator activated="true" class="retrieve" compatibility="5.0.10" expanded="true" height="60" name="vm_socdem_e_Xchanged" width="90" x="45" y="120">
            <parameter key="repository_entry" value="vm_socdem_e_Xchanged"/>
          </operator>
          <operator activated="true" class="set_role" compatibility="5.0.10" expanded="true" height="76" name="SET ID" width="90" x="160" y="127">
            <parameter key="name" value="cuenta"/>
            <parameter key="target_role" value="id"/>
          </operator>
          <operator activated="true" class="set_role" compatibility="5.0.10" expanded="true" height="76" name="Set Role" width="90" x="281" y="136">
            <parameter key="name" value="aprob_c"/>
            <parameter key="target_role" value="label"/>
          </operator>
          <operator activated="true" class="select_attributes" compatibility="5.0.10" expanded="true" height="76" name="Select Attributes (2)" width="90" x="380" y="30">
            <parameter key="attribute_filter_type" value="subset"/>
            <parameter key="attributes" value="turno|raz_elec|no_unam|ingr_fi|esc_m|edad|carrera|a_ov|sost_ec|alg|geo_e|geo_a|qui|elec|X_sec|X_bach|bach|ENP|transp|dur_bach|trastes|refri|c_agua|tv_cable|horno_m|cel|inter|comp|auto_p|p_serv"/>
          </operator>
          <operator activated="true" class="naive_bayes" compatibility="5.0.10" expanded="true" height="76" name="Naive Bayes" width="90" x="447" y="165"/>
          <operator activated="true" class="retrieve" compatibility="5.0.10" expanded="true" height="60" name="vm_socdem_e_Xchanged_prueba" width="90" x="45" y="300">
            <parameter key="repository_entry" value="vm_socdem_e_Xchanged_prueba"/>
          </operator>
          <operator activated="true" class="set_role" compatibility="5.0.10" expanded="true" height="76" name="vm_socdem_prueba" width="90" x="112" y="435">
            <parameter key="name" value="cuenta"/>
            <parameter key="target_role" value="id"/>
          </operator>
          <operator activated="true" class="select_attributes" compatibility="5.0.10" expanded="true" height="76" name="Select Attributes" width="90" x="313" y="300">
            <parameter key="attribute_filter_type" value="subset"/>
            <parameter key="attributes" value="carrera|no_unam|edad|turno|raz_elec|ingr_fi|esc_m|a_ov|sost_ec|alg|geo_a|geo_e|elec|qui|X_sec|X_bach|bach|ENP|dur_bach|transp|refri|trastes|c_agua|cel|tv_cable|horno_m|comp|inter|auto_p|p_serv"/>
          </operator>
          <operator activated="true" class="apply_model" compatibility="5.0.10" expanded="true" height="76" name="Apply Model" width="90" x="492" y="288">
            <list key="application_parameters"/>
            <parameter key="create_view" value="true"/>
          </operator>
          <connect from_op="vm_socdem_e_Xchanged" from_port="output" to_op="SET ID" to_port="example set input"/>
          <connect from_op="SET ID" from_port="example set output" to_op="Set Role" to_port="example set input"/>
          <connect from_op="Set Role" from_port="example set output" to_op="Select Attributes (2)" to_port="example set input"/>
          <connect from_op="Select Attributes (2)" from_port="example set output" to_op="Naive Bayes" to_port="training set"/>
          <connect from_op="Naive Bayes" from_port="model" to_op="Apply Model" to_port="model"/>
          <connect from_op="vm_socdem_e_Xchanged_prueba" from_port="output" to_op="vm_socdem_prueba" to_port="example set input"/>
          <connect from_op="vm_socdem_prueba" from_port="example set output" to_op="Select Attributes" to_port="example set input"/>
          <connect from_op="Select Attributes" from_port="example set output" to_op="Apply Model" to_port="unlabelled data"/>
          <connect from_op="Apply Model" from_port="labelled data" to_port="result 1"/>
          <connect from_op="Apply Model" from_port="model" to_port="result 2"/>
          <portSpacing port="source_input 1" spacing="0"/>
          <portSpacing port="sink_result 1" spacing="216"/>
          <portSpacing port="sink_result 2" spacing="0"/>
          <portSpacing port="sink_result 3" spacing="0"/>
        </process>
      </operator>
    </process>
  • land
    land New Altair Community Member
    Hi,
    the merge operator is now called append.

    Greetings,
      Sebastian