"Error using Adaboost"

inthewoods
inthewoods New Altair Community Member
edited November 5 in Community Q&A
I get the following errors when I try and run a simulation with the Adaboost component:

Exception: com.rapidminer.example.AttributeTypeException
Message: Cannot map index of nominal attribute to nominal value: index 4 is out of bounds!
Stack trace:

  com.rapidminer.example.table.PolynominalMapping.mapIndex(PolynominalMapping.java:137)
  com.rapidminer.operator.learner.meta.AdaBoostModel.evaluateSpecialAttributes(AdaBoostModel.java:231)
  com.rapidminer.operator.learner.meta.AdaBoostModel.performPrediction(AdaBoostModel.java:166)
  com.rapidminer.operator.learner.PredictionModel.apply(PredictionModel.java:76)
  com.rapidminer.operator.ModelApplier.doWork(ModelApplier.java:100)
  com.rapidminer.operator.Operator.execute(Operator.java:771)
  com.rapidminer.operator.execution.SimpleUnitExecutor.execute(SimpleUnitExecutor.java:51)
  com.rapidminer.operator.ExecutionUnit.execute(ExecutionUnit.java:709)
  com.rapidminer.operator.OperatorChain.doWork(OperatorChain.java:368)
  com.rapidminer.operator.Operator.execute(Operator.java:771)
  com.rapidminer.Process.run(Process.java:899)
  com.rapidminer.Process.run(Process.java:795)
  com.rapidminer.Process.run(Process.java:790)
  com.rapidminer.Process.run(Process.java:780)
  com.rapidminer.gui.ProcessThread.run(ProcessThread.java:62)

Here's the setup:
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<process version="5.0">
  <context>
    <input/>
    <output/>
    <macros/>
  </context>
  <operator activated="true" class="process" compatibility="5.0.11" expanded="true" name="Process">
    <process expanded="true" height="341" width="605">
      <operator activated="true" class="retrieve" compatibility="5.0.11" expanded="true" height="60" name="Retrieve" width="90" x="45" y="30">
        <parameter key="repository_entry" value="//NewLocalRepository/SPY_test_data"/>
      </operator>
      <operator activated="true" class="retrieve" compatibility="5.0.11" expanded="true" height="60" name="Retrieve (2)" width="90" x="99" y="164">
        <parameter key="repository_entry" value="//NewLocalRepository/SPY_apply_model"/>
      </operator>
      <operator activated="true" class="series:windowing" compatibility="5.0.2" expanded="true" height="76" name="Windowing" width="90" x="179" y="30">
        <parameter key="horizon" value="1"/>
        <parameter key="window_size" value="1"/>
        <parameter key="create_label" value="true"/>
        <parameter key="label_attribute" value="ROC-1"/>
      </operator>
      <operator activated="true" class="adaboost" compatibility="5.0.11" expanded="true" height="76" name="AdaBoost" width="90" x="357" y="35">
        <process expanded="true" height="315" width="605">
          <operator activated="true" class="parallel:decision_tree_weight_based_parallel" compatibility="5.0.1" expanded="true" height="60" name="DecisionTree (Weight-Based)" width="90" x="243" y="58">
            <process expanded="true">
              <portSpacing port="source_training set" spacing="0"/>
              <portSpacing port="sink_weights" spacing="0"/>
            </process>
          </operator>
          <connect from_port="training set" to_op="DecisionTree (Weight-Based)" to_port="training set"/>
          <connect from_op="DecisionTree (Weight-Based)" from_port="model" to_port="model"/>
          <portSpacing port="source_training set" spacing="0"/>
          <portSpacing port="sink_model" spacing="0"/>
        </process>
      </operator>
      <operator activated="true" class="apply_model" compatibility="5.0.11" expanded="true" height="76" name="Apply Model (2)" width="90" x="380" y="210">
        <list key="application_parameters"/>
      </operator>
      <connect from_op="Retrieve" from_port="output" to_op="Windowing" to_port="example set input"/>
      <connect from_op="Retrieve (2)" from_port="output" to_op="Apply Model (2)" to_port="unlabelled data"/>
      <connect from_op="Windowing" from_port="example set output" to_op="AdaBoost" to_port="training set"/>
      <connect from_op="AdaBoost" from_port="model" to_op="Apply Model (2)" to_port="model"/>
      <connect from_op="Apply Model (2)" from_port="labelled data" to_port="result 1"/>
      <portSpacing port="source_input 1" spacing="0"/>
      <portSpacing port="sink_result 1" spacing="0"/>
      <portSpacing port="sink_result 2" spacing="0"/>
    </process>
  </operator>
</process>
I've submitted the bug, but I was wondering if anyone had any insight as to what I'm doing wrong.

Thanks!
Tagged:

Answers

  • land
    land New Altair Community Member
    Hi,
    I guess you have different nominal values in your both data sets. I admit this shouldn't cause any problems, but if you first combine both datasets and after this split them into train and test set, this error won't happen.

    Greetings,
    Sebastian
  • inthewoods
    inthewoods New Altair Community Member
    If you look at the way I've got it setup, I've got two different data sets.  So I'm feeding in a test data set, and the outputing a model and applying that model to a new dataset.  So I don't think what you've highlighted is the problem.  Other thoughts?
  • inthewoods
    inthewoods New Altair Community Member
    Woops Sebastian - I misread what you wrote - in answer to your question, the two data sets have the same data - but I don't know what you mean by having different nominal values - I'm afraid my level of math isn't high enough to understand the definition!
  • land
    land New Altair Community Member
    Hi,
    actually nominal values don't have anything connected to math: Nominal values are non numerical values like words, etc. What can happen is:
    You have a train data set that contains examples about things of two different colors like "red" and "green". But what happens if the color "blue" is now mentioned in the test set? actually this value isn't know to any model, because it simply cant know that it exists. This is a general problem and all what the model could (and definitively should do) is to throw a better and more detailed error message.

    To avoid this problem: Append one data set to the other and split it again. Then the datasets know  which values exists in the combined data! Then the model will cope with this.

    Anyway I will search this problem causing the crash right now.

    Greetings,
      Sebastian