Setting Binominal Label to positive or negative

Cleo
Cleo New Altair Community Member
edited November 5 in Community Q&A
The binominal label in my process has the values “up” or “down”.  When I use the operator Preformance (Binominal Classification), I have the option select “true positive”, “true negative” etc.

How do I ensure the value “up” is set as “positive”?  Sometimes Rapidminer chooses “up” to be positive and other times it is negative.

Cheers,
Cleo

Answers

  • B_Miner
    B_Miner New Altair Community Member
    I would be interested in what the default is, too. I would have guessed alphabetical order, but it doesn't sound like it if you experience it shifting.

    Can you not use the operator remap binomial (in the folder Nominal value Modification) to set what is to be considered positive? Give that a shot.
  • Cleo
    Cleo New Altair Community Member
    Thanks B_Miner,

    The remap binomial operator works perfectly.

    Cheer,
    Cleo
  • land
    land New Altair Community Member
    Hi,
    Just as a side note until the problem has been fixed: The order actually depends on the order of occurring in the read data set, if no different information is available. If you import the data once, this order is saved and fixed for now on.
    If you want to combine different data sets, you will have to use the mentioned solution.

    Greetings,
      Sebastian
  • Is there anything to pay attention to when using the Remap Binominal Operator? For me, it is without any effect.
  • land
    land New Altair Community Member
    Hi,
    you won't see any difference in the data itself, but the meaning might be changed. Some operators like FP-Growth need to now, whats the positive and whats the negative value. Since there are to many values possible like true/false, 1/0, positive/negative, yes/no, we decided that the first element of the mapping will be treated as negative, the second as positive. As long as this isn't important for your process, you won't notice any difference.

    Greetings,
      Sebastian
  • But when I use "Remap Binominal", the order of the examples should not matter, right?
    I have a dataset with labels "true" and "false", and I want to use "Performance (Binominal Classification)" inside a XValidation to calculate precision and recall of the positive class, which should be "true". But most of the time the performance operator says "positive class: false", even though I inserted a "Remap Binominal", to map "false" to the negative value.
    The data is read by two "Read CSV" and is combined via "Append". I made sure, that the operator which reads the negative examples is executed before the other one, and is connected to the first input port of "Append", nevertheless I have the described problem.
  • land
    land New Altair Community Member
    Hi,
    could you please post your process here? If possible you could somehow send me your data? I will check it then.

    Greetings,
    Sebastian
  • dragoljub
    dragoljub New Altair Community Member
    I have also not been able to use remap binomial correctly. Anyone have an example process with it working?

    Thanks,
    -Gagi
  • Below you find a simplified version of my process.
    I sent you a link to my data via pm. I am using RapidMiner 5.0.005.
    <?xml version="1.0" encoding="UTF-8" standalone="no"?>
    <process version="5.0">
      <context>
        <input/>
        <output/>
        <macros/>
      </context>
      <operator activated="true" class="process" expanded="true" name="Process">
        <process expanded="true" height="463" width="1083">
          <operator activated="true" class="set_macros" expanded="true" height="60" name="Define Makros" width="90" x="45" y="30">
            <list key="macros">
              <parameter key="DataRoot" value="/home/jdoe/data/"/>
            </list>
          </operator>
          <operator activated="true" class="subprocess" expanded="true" height="76" name="Read Zd 15-25" width="90" x="45" y="120">
            <process expanded="true" height="463" width="844">
              <operator activated="true" class="read_csv" expanded="true" height="60" name="Read CSV (7)" width="90" x="112" y="30">
                <description>Liest zwei CSV-Dateien ein und kombiniert die Example-Sets</description>
                <parameter key="file_name" value="%{DataRoot}/Hadron_4000_Zd_15_25.csv"/>
              </operator>
              <operator activated="true" class="read_csv" expanded="true" height="60" name="Read CSV (8)" width="90" x="112" y="210">
                <description>Liest zwei CSV-Dateien ein und kombiniert die Example-Sets</description>
                <parameter key="file_name" value="%{DataRoot}/Gamma_4000_Zd_15_25.csv"/>
              </operator>
              <operator activated="true" class="append" expanded="true" height="94" name="Append (4)" width="90" x="315" y="30"/>
              <connect from_op="Read CSV (7)" from_port="output" to_op="Append (4)" to_port="example set 1"/>
              <connect from_op="Read CSV (8)" from_port="output" to_op="Append (4)" to_port="example set 2"/>
              <connect from_op="Append (4)" from_port="merged set" to_port="out 1"/>
              <portSpacing port="source_in 1" spacing="0"/>
              <portSpacing port="sink_out 1" spacing="0"/>
              <portSpacing port="sink_out 2" spacing="0"/>
            </process>
          </operator>
          <operator activated="true" class="set_role" expanded="true" height="76" name="Mark Id" width="90" x="246" y="120">
            <parameter key="name" value="idx"/>
            <parameter key="target_role" value="id"/>
          </operator>
          <operator activated="true" class="set_role" expanded="true" height="76" name="Mark Label" width="90" x="374" y="120">
            <parameter key="name" value="ExpectedLabel"/>
            <parameter key="target_role" value="label"/>
          </operator>
          <operator activated="true" class="remap_binominals" expanded="true" height="76" name="Remap Binominals" width="90" x="514" y="120">
            <parameter key="attribute_filter_type" value="single"/>
            <parameter key="attribute" value="ExpectedLabel"/>
            <parameter key="include_special_attributes" value="true"/>
            <parameter key="negative_value" value="Gamma"/>
            <parameter key="positive_value" value="Hadron"/>
          </operator>
          <operator activated="true" class="naive_bayes" expanded="true" height="76" name="Naive Bayes (2)" width="90" x="644" y="120"/>
          <operator activated="true" class="apply_model" expanded="true" height="76" name="Apply Model (2)" width="90" x="779" y="120">
            <list key="application_parameters"/>
          </operator>
          <operator activated="true" breakpoints="after" class="performance_binominal_classification" expanded="true" height="76" name="Performance (2)" width="90" x="916" y="120">
            <parameter key="main_criterion" value="accuracy"/>
            <parameter key="precision" value="true"/>
            <parameter key="recall" value="true"/>
          </operator>
          <connect from_op="Read Zd 15-25" from_port="out 1" to_op="Mark Id" to_port="example set input"/>
          <connect from_op="Mark Id" from_port="example set output" to_op="Mark Label" to_port="example set input"/>
          <connect from_op="Mark Label" from_port="example set output" to_op="Remap Binominals" to_port="example set input"/>
          <connect from_op="Remap Binominals" from_port="example set output" to_op="Naive Bayes (2)" to_port="training set"/>
          <connect from_op="Naive Bayes (2)" from_port="model" to_op="Apply Model (2)" to_port="model"/>
          <connect from_op="Naive Bayes (2)" from_port="exampleSet" to_op="Apply Model (2)" to_port="unlabelled data"/>
          <connect from_op="Apply Model (2)" from_port="labelled data" to_op="Performance (2)" to_port="labelled data"/>
          <connect from_op="Performance (2)" from_port="performance" to_port="result 1"/>
          <portSpacing port="source_input 1" spacing="0"/>
          <portSpacing port="sink_result 1" spacing="0"/>
          <portSpacing port="sink_result 2" spacing="0"/>
        </process>
      </operator>
    </process>
  • land
    land New Altair Community Member
    Hi,
    the solution is comparably easy:
    Your label simply isn't binominal, so the remapping operator can't do anything about this. I must admit, that it should somehow notify you about this and I have added the proper meta data testing to the operator. This will be available with the next version.

    To make your process work, you have to insert a nominal to binominal operator like in the process below:
    <?xml version="1.0" encoding="UTF-8" standalone="no"?>
    <process version="5.0">
      <context>
        <input/>
        <output/>
        <macros/>
      </context>
      <operator activated="true" class="process" expanded="true" name="Process">
        <process expanded="true" height="463" width="1083">
          <operator activated="true" class="set_macros" expanded="true" height="60" name="Define Makros" width="90" x="45" y="30">
            <list key="macros">
              <parameter key="DataRoot" value="/home/jdoe/data/"/>
            </list>
          </operator>
          <operator activated="true" class="subprocess" expanded="true" height="76" name="Read Zd 15-25" width="90" x="45" y="120">
            <process expanded="true" height="463" width="844">
              <operator activated="true" class="read_csv" expanded="true" height="60" name="Read CSV (7)" width="90" x="112" y="30">
                <description>Liest zwei CSV-Dateien ein und kombiniert die Example-Sets</description>
                <parameter key="file_name" value="C:\Dokumente und Einstellungen\sland\Desktop\Hadron_4000_Zd_15_25.csv"/>
              </operator>
              <operator activated="true" class="read_csv" expanded="true" height="60" name="Read CSV (8)" width="90" x="112" y="210">
                <description>Liest zwei CSV-Dateien ein und kombiniert die Example-Sets</description>
                <parameter key="file_name" value="C:\Dokumente und Einstellungen\sland\Desktop\Gamma_4000_Zd_15_25.csv"/>
              </operator>
              <operator activated="true" class="append" expanded="true" height="94" name="Append (4)" width="90" x="315" y="30"/>
              <connect from_op="Read CSV (7)" from_port="output" to_op="Append (4)" to_port="example set 1"/>
              <connect from_op="Read CSV (8)" from_port="output" to_op="Append (4)" to_port="example set 2"/>
              <connect from_op="Append (4)" from_port="merged set" to_port="out 1"/>
              <portSpacing port="source_in 1" spacing="0"/>
              <portSpacing port="sink_out 1" spacing="0"/>
              <portSpacing port="sink_out 2" spacing="0"/>
            </process>
          </operator>
          <operator activated="true" class="set_role" expanded="true" height="76" name="Mark Id" width="90" x="185" y="120">
            <parameter key="name" value="idx"/>
            <parameter key="target_role" value="id"/>
          </operator>
          <operator activated="true" class="set_role" expanded="true" height="76" name="Mark Label" width="90" x="313" y="120">
            <parameter key="name" value="ExpectedLabel"/>
            <parameter key="target_role" value="label"/>
          </operator>
          <operator activated="true" class="nominal_to_binominal" expanded="true" height="94" name="Nominal to Binominal" width="90" x="447" y="210">
            <parameter key="attribute_filter_type" value="single"/>
            <parameter key="attribute" value="ExpectedLabel"/>
            <parameter key="include_special_attributes" value="true"/>
          </operator>
          <operator activated="true" class="remap_binominals" expanded="true" height="76" name="Remap Binominals" width="90" x="581" y="255">
            <parameter key="attribute_filter_type" value="single"/>
            <parameter key="attribute" value="ExpectedLabel"/>
            <parameter key="include_special_attributes" value="true"/>
            <parameter key="negative_value" value="Gamma"/>
            <parameter key="positive_value" value="Hadron"/>
          </operator>
          <operator activated="true" class="naive_bayes" expanded="true" height="76" name="Naive Bayes (2)" width="90" x="648" y="75"/>
          <operator activated="true" class="apply_model" expanded="true" height="76" name="Apply Model (2)" width="90" x="782" y="75">
            <list key="application_parameters"/>
          </operator>
          <operator activated="true" breakpoints="after" class="performance_binominal_classification" expanded="true" height="76" name="Performance (2)" width="90" x="916" y="120">
            <parameter key="main_criterion" value="accuracy"/>
            <parameter key="precision" value="true"/>
            <parameter key="recall" value="true"/>
          </operator>
          <connect from_op="Read Zd 15-25" from_port="out 1" to_op="Mark Id" to_port="example set input"/>
          <connect from_op="Mark Id" from_port="example set output" to_op="Mark Label" to_port="example set input"/>
          <connect from_op="Mark Label" from_port="example set output" to_op="Nominal to Binominal" to_port="example set input"/>
          <connect from_op="Nominal to Binominal" from_port="example set output" to_op="Remap Binominals" to_port="example set input"/>
          <connect from_op="Remap Binominals" from_port="example set output" to_op="Naive Bayes (2)" to_port="training set"/>
          <connect from_op="Naive Bayes (2)" from_port="model" to_op="Apply Model (2)" to_port="model"/>
          <connect from_op="Naive Bayes (2)" from_port="exampleSet" to_op="Apply Model (2)" to_port="unlabelled data"/>
          <connect from_op="Apply Model (2)" from_port="labelled data" to_op="Performance (2)" to_port="labelled data"/>
          <connect from_op="Performance (2)" from_port="performance" to_port="result 1"/>
          <portSpacing port="source_input 1" spacing="0"/>
          <portSpacing port="sink_result 1" spacing="0"/>
          <portSpacing port="sink_result 2" spacing="0"/>
        </process>
      </operator>
    </process>
    Greetings,
      Sebastian
  • Hi,
    thanks for your reply. With the modified process I got it working.
    But the "Nominal to Binominal" Operator has no effect unless I enable the option "transform binominal". So it seems that it believes that the data is already binominal. With this option enabled, the process works. But I have one remark: the "Remap Binominal" operator should output a warning if the example set does not contain the specified attribute or if the attribute does not contain the specified values. Currently it just continues, which is very annoying if you have a typo in one of the fields.
  • land
    land New Altair Community Member
    Hi,
    this is exactly what I added to the code immediately :) By the way: You must NOT turn this parameter on, unless you want to have your binominal attribute dichotomized. If you just want to change the attribute type, turn it off. If you make a breakpoint just after the operator, you will see that the type changed, but nothing more.

    Greetings,
      Sebastian
  • No, the type does not change if I do not enable "transform binominal". If I do, I get those two new binominal attributes "Label = A" and "Label = B", which is also not exactly what I want.
  • land
    land New Altair Community Member
    Hi,
    I have posted a small process to the RapidMiner's community extension. If you install this extension, you could open the process called "Correct Attribute Type to Binominal". Please take a look at it and if this does not work as expected, update your RapidMiner to the last version.

    Greetings,
      Sebastian
  • Thank you, now I got it working. For some reason, I had "create view" enabled in the operator. After disabling, everything worked fine.