desicion tree id3 operator gives problem

yakaryos
yakaryos New Altair Community Member
edited November 5 in Community Q&A
hi,

I have different types of data and i have to work with classification algorithm id3. This gives best trees for my work.
but some of my data can't work with id3, it's also can work with desicion tree and CHAİD algorithms.

my data is :

http://rapidshare.com/files/383023814/insan_rp.xls.html

and my process is(it's simple):
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<process version="5.0">
 <context>
   <input/>
   <output/>
   <macros/>
 </context>
 <operator activated="true" class="process" expanded="true" name="Process">
   <process expanded="true" height="359" width="614">
     <operator activated="true" class="retrieve" expanded="true" height="60" name="Retrieve" width="90" x="45" y="30">
       <parameter key="repository_entry" value="//My Repository/29.04 rpler/insan"/>
     </operator>
     <operator activated="true" class="free_memory" expanded="true" height="76" name="Free Memory" width="90" x="179" y="75"/>
     <operator activated="true" class="id3" expanded="true" height="76" name="ID3" width="90" x="246" y="255"/>
     <connect from_op="Retrieve" from_port="output" to_op="Free Memory" to_port="through 1"/>
     <connect from_op="Free Memory" from_port="through 1" to_op="ID3" to_port="training set"/>
     <connect from_op="ID3" from_port="model" to_port="result 1"/>
     <portSpacing port="source_input 1" spacing="0"/>
     <portSpacing port="sink_result 1" spacing="0"/>
     <portSpacing port="sink_result 2" spacing="0"/>
   </process>
 </operator>
</process>
and this is the failure note:

Exception: java.lang.RuntimeException
Message: Cannot clone com.rapidminer.example.set.SplittedExampleSet: java.lang.reflect.InvocationTargetException. Target: java.lang.RuntimeException: Cannot clone com.rapidminer.example.set.SplittedExampleSet: java.lang.reflect.InvocationTargetException. Target: java.lang.OutOfMemoryError: Java heap space. Cause: java.lang.OutOfMemoryError: Java heap space.. Cause: java.lang.RuntimeException: Cannot clone com.rapidminer.example.set.SplittedExampleSet: java.lang.reflect.InvocationTargetException. Target: java.lang.OutOfMemoryError: Java heap space. Cause: java.lang.OutOfMemoryError: Java heap space..
Stack trace:

  com.rapidminer.example.set.AbstractExampleSet.clone(AbstractExampleSet.java:390)
  com.rapidminer.operator.learner.tree.TreeBuilder.buildTree(TreeBuilder.java:208)
  com.rapidminer.operator.learner.tree.TreeBuilder.buildTree(TreeBuilder.java:220)
  com.rapidminer.operator.learner.tree.TreeBuilder.buildTree(TreeBuilder.java:220)
  com.rapidminer.operator.learner.tree.TreeBuilder.buildTree(TreeBuilder.java:220)
  com.rapidminer.operator.learner.tree.TreeBuilder.buildTree(TreeBuilder.java:220)
  com.rapidminer.operator.learner.tree.TreeBuilder.buildTree(TreeBuilder.java:220)
  com.rapidminer.operator.learner.tree.TreeBuilder.buildTree(TreeBuilder.java:220)
  com.rapidminer.operator.learner.tree.TreeBuilder.buildTree(TreeBuilder.java:220)
  com.rapidminer.operator.learner.tree.TreeBuilder.buildTree(TreeBuilder.java:220)

best regards.
Tagged:

Answers

  • land
    land New Altair Community Member
    Hi,
    unfortunately I wasn't able to test it with your data, because I simply didn't know what was the label :)
    But the ID3 operator works for me. Do you use the latest RapidMiner 5 version 5.0.005?

    Greetings,
      Sebastian
  • yakaryos
    yakaryos New Altair Community Member
    hi,
    thanks for your attention.

    yes i use the last version of rp.My data's label is c100 and id is c5. if you have sometime to try it, i wil be so hapyy.

    best regards.
  • B_Miner
    B_Miner New Altair Community Member
    I was able to run it fine (after assigning label and ID). By the looks of the warning you are running out of memory. Try splitting your training set and building on a part (+ buy a lot more ram and 64 bit OS ;-)

    Here I am building on 50% of the data and validating on the other 50%.


    <?xml version="1.0" encoding="UTF-8" standalone="no"?>
    <process version="5.0">
      <context>
        <input/>
        <output/>
        <macros/>
      </context>
      <operator activated="true" class="process" expanded="true" name="Process">
        <process expanded="true" height="386" width="801">
          <operator activated="true" class="read_excel" expanded="true" height="60" name="Read Excel" width="90" x="14" y="38">
            <parameter key="excel_file" value="C:\Documents and Settings\Owner\My Documents\Downloads\insan_rp.xls"/>
            <list key="annotations"/>
          </operator>
          <operator activated="true" class="set_role" expanded="true" height="76" name="Set Role" width="90" x="45" y="165">
            <parameter key="name" value="c100 insan hatas? m?"/>
            <parameter key="target_role" value="label"/>
          </operator>
          <operator activated="true" class="set_role" expanded="true" height="76" name="Set Role (2)" width="90" x="179" y="165">
            <parameter key="name" value="c5 numara"/>
            <parameter key="target_role" value="id"/>
          </operator>
          <operator activated="true" class="split_validation" expanded="true" height="112" name="Validation" width="90" x="380" y="165">
            <parameter key="split_ratio" value="0.5"/>
            <process expanded="true" height="405" width="306">
              <operator activated="true" class="id3" expanded="true" height="76" name="ID3" width="90" x="83" y="154"/>
              <connect from_port="training" to_op="ID3" to_port="training set"/>
              <connect from_op="ID3" from_port="model" to_port="model"/>
              <portSpacing port="source_training" spacing="0"/>
              <portSpacing port="sink_model" spacing="0"/>
              <portSpacing port="sink_through 1" spacing="0"/>
            </process>
            <process expanded="true" height="405" width="306">
              <operator activated="true" class="apply_model" expanded="true" height="76" name="Apply Model" width="90" x="45" y="75">
                <list key="application_parameters"/>
              </operator>
              <operator activated="true" class="performance_classification" expanded="true" height="76" name="Performance" width="90" x="168" y="147">
                <parameter key="use_example_weights" value="false"/>
                <list key="class_weights"/>
              </operator>
              <connect from_port="model" to_op="Apply Model" to_port="model"/>
              <connect from_port="test set" to_op="Apply Model" to_port="unlabelled data"/>
              <connect from_op="Apply Model" from_port="labelled data" to_op="Performance" to_port="labelled data"/>
              <connect from_op="Performance" from_port="performance" to_port="averagable 1"/>
              <portSpacing port="source_model" spacing="0"/>
              <portSpacing port="source_test set" spacing="0"/>
              <portSpacing port="source_through 1" spacing="0"/>
              <portSpacing port="sink_averagable 1" spacing="0"/>
              <portSpacing port="sink_averagable 2" spacing="0"/>
            </process>
          </operator>
          <connect from_op="Read Excel" from_port="output" to_op="Set Role" to_port="example set input"/>
          <connect from_op="Set Role" from_port="example set output" to_op="Set Role (2)" to_port="example set input"/>
          <connect from_op="Set Role (2)" from_port="example set output" to_op="Validation" to_port="training"/>
          <connect from_op="Validation" from_port="averagable 1" to_port="result 1"/>
          <portSpacing port="source_input 1" spacing="0"/>
          <portSpacing port="sink_result 1" spacing="0"/>
          <portSpacing port="sink_result 2" spacing="0"/>
        </process>
      </operator>
    </process>


  • B_Miner
    B_Miner New Altair Community Member
    FYI: I could get around 75% accuracy using just 25% of the data and using a Naive Bayes (5 points better than ID3)


    <?xml version="1.0" encoding="UTF-8" standalone="no"?>
    <process version="5.0">
      <context>
        <input/>
        <output/>
        <macros/>
      </context>
      <operator activated="true" class="process" expanded="true" name="Process">
        <process expanded="true" height="386" width="801">
          <operator activated="true" class="read_excel" expanded="true" height="60" name="Read Excel" width="90" x="14" y="38">
            <parameter key="excel_file" value="C:\Documents and Settings\Owner\My Documents\Downloads\insan_rp.xls"/>
            <list key="annotations"/>
          </operator>
          <operator activated="true" class="set_role" expanded="true" height="76" name="Set Role" width="90" x="45" y="165">
            <parameter key="name" value="c100 insan hatas? m?"/>
            <parameter key="target_role" value="label"/>
          </operator>
          <operator activated="true" class="set_role" expanded="true" height="76" name="Set Role (2)" width="90" x="179" y="165">
            <parameter key="name" value="c5 numara"/>
            <parameter key="target_role" value="id"/>
          </operator>
          <operator activated="true" class="split_validation" expanded="true" height="112" name="Validation" width="90" x="380" y="165">
            <parameter key="split_ratio" value="0.25"/>
            <process expanded="true" height="405" width="306">
              <operator activated="false" class="id3" expanded="true" height="76" name="ID3" width="90" x="112" y="30"/>
              <operator activated="false" class="random_forest" expanded="true" height="76" name="Random Forest" width="90" x="112" y="120">
                <parameter key="number_of_trees" value="25"/>
              </operator>
              <operator activated="false" class="adaboost" expanded="true" height="76" name="AdaBoost" width="90" x="112" y="210">
                <parameter key="iterations" value="20"/>
                <process expanded="true" height="368" width="644">
                  <operator activated="false" class="decision_stump" expanded="true" height="76" name="Decision Stump" width="90" x="100" y="157"/>
                  <connect from_port="training set" to_op="Decision Stump" to_port="training set"/>
                  <connect from_op="Decision Stump" from_port="model" to_port="model"/>
                  <portSpacing port="source_training set" spacing="0"/>
                  <portSpacing port="sink_model" spacing="0"/>
                </process>
              </operator>
              <operator activated="true" class="naive_bayes" expanded="true" height="76" name="Naive Bayes" width="90" x="112" y="300"/>
              <connect from_port="training" to_op="Naive Bayes" to_port="training set"/>
              <connect from_op="Naive Bayes" from_port="model" to_port="model"/>
              <portSpacing port="source_training" spacing="0"/>
              <portSpacing port="sink_model" spacing="0"/>
              <portSpacing port="sink_through 1" spacing="0"/>
            </process>
            <process expanded="true" height="405" width="306">
              <operator activated="true" class="apply_model" expanded="true" height="76" name="Apply Model" width="90" x="45" y="75">
                <list key="application_parameters"/>
              </operator>
              <operator activated="true" class="performance_classification" expanded="true" height="76" name="Performance" width="90" x="168" y="147">
                <parameter key="use_example_weights" value="false"/>
                <list key="class_weights"/>
              </operator>
              <connect from_port="model" to_op="Apply Model" to_port="model"/>
              <connect from_port="test set" to_op="Apply Model" to_port="unlabelled data"/>
              <connect from_op="Apply Model" from_port="labelled data" to_op="Performance" to_port="labelled data"/>
              <connect from_op="Performance" from_port="performance" to_port="averagable 1"/>
              <portSpacing port="source_model" spacing="0"/>
              <portSpacing port="source_test set" spacing="0"/>
              <portSpacing port="source_through 1" spacing="0"/>
              <portSpacing port="sink_averagable 1" spacing="0"/>
              <portSpacing port="sink_averagable 2" spacing="0"/>
            </process>
          </operator>
          <connect from_op="Read Excel" from_port="output" to_op="Set Role" to_port="example set input"/>
          <connect from_op="Set Role" from_port="example set output" to_op="Set Role (2)" to_port="example set input"/>
          <connect from_op="Set Role (2)" from_port="example set output" to_op="Validation" to_port="training"/>
          <connect from_op="Validation" from_port="averagable 1" to_port="result 1"/>
          <portSpacing port="source_input 1" spacing="0"/>
          <portSpacing port="sink_result 1" spacing="0"/>
          <portSpacing port="sink_result 2" spacing="0"/>
        </process>
      </operator>
    </process>


  • yakaryos
    yakaryos New Altair Community Member
    hi,
    my fault note is different. my computer don't gives out of memory warning.I wrote it above in my first message.
    I try with naive bayes and the result is good but i need to see tree graphic and text. so i want to use id3 algorithm.
    so i dont understand the problem. my computer configuration is two processer,and 1.5 gb ram with win 7 ultimate.
    best regards
  • land
    land New Altair Community Member
    Hi,
    did you try to restrict the maximal depth of the tree and let it stop with greater leafes? And the question arises how many memory RapidMiner actually has. Please take a look in the result view at the memory monitor. What's the maximum available memory?

    Greetings,
     Sebastian
  • yakaryos
    yakaryos New Altair Community Member
    hi,
    i restrict the maximal depth but nothing changed. I reduced minimal size for split,leaf size and gain ratio, but i'm seeing the same error.
    when i redound leaf size to " 8 ",program answer it.(is this make my solution better or worse? I cant give a desicion? what you think?)
    in rp system monitor shows max:773 MB and total: 773 MB.
    And my computer's ( win 7 ultimate) ram cursor shows % 85. so there is also % 15 free ram for rp.

    this answer provide a solution?

    best regards.
  • B_Miner
    B_Miner New Altair Community Member
    Looking at your error yakaryos, I am not sure why that is not a memory issue. I says out of memory several times (JVM).

    Did you try sampling the data and building on a small % (say 10%) and see if it will build?


    From your posting:
    Exception: java.lang.RuntimeException
    Message: Cannot clone com.rapidminer.example.set.SplittedExampleSet: java.lang.reflect.InvocationTargetException. Target: java.lang.RuntimeException: Cannot clone com.rapidminer.example.set.SplittedExampleSet: java.lang.reflect.InvocationTargetException. Target: java.lang.OutOfMemoryError: Java heap space. Cause: java.lang.OutOfMemoryError: Java heap space.. Cause: java.lang.RuntimeException: Cannot clone com.rapidminer.example.set.SplittedExampleSet: java.lang.reflect.InvocationTargetException. Target: java.lang.OutOfMemoryError: Java heap space. Cause: java.lang.OutOfMemoryError: Java heap space..
    Stack trace:
  • yakaryos
    yakaryos New Altair Community Member
    hi,
    I try with sample (%10),it don't give any problem.
    And i built the three. but i have 15000 examples so i want to use all of them.

    best regards

    p.s. dont forget to vote for rapidminer at http://www.kdnuggets.com
  • yakaryos
    yakaryos New Altair Community Member
    hi,

    i try my project with 3g ram and 4 processer pc and i have the results. it works but the tree has got lots of leafs. also  its a huge tree.
    at the end a workstation solves the problem.:)
    best regards.
  • Legacy User
    Legacy User New Altair Community Member
    Hi,
    I have an xml like below

    <?xml version="1.0" encoding="UTF-8" standalone="no"?>
    <process version="5.1.006">
      <context>
        <input/>
        <output/>
        <macros/>
      </context>
      <operator activated="true" class="process" compatibility="5.1.006" expanded="true" name="Process">
        <process expanded="true" height="686" width="974">
          <operator activated="true" class="read_csv" compatibility="5.1.006" expanded="true" height="60" name="Read CSV" width="90" x="219" y="270">
            <parameter key="csv_file" value="C:\Documents and Settings\rzkl07\RapidMinor5\golfTrain.csv"/>
            <parameter key="column_separators" value=","/>
            <parameter key="use_quotes" value="false"/>
            <parameter key="first_row_as_names" value="false"/>
            <list key="annotations">
              <parameter key="0" value="Name"/>
            </list>
            <list key="data_set_meta_data_information">
              <parameter key="0" value="Outlook.true.polynominal.attribute"/>
              <parameter key="1" value="Temperature.true.integer.attribute"/>
              <parameter key="2" value="Humidity.true.integer.attribute"/>
              <parameter key="3" value="Windy.true.binominal.attribute"/>
              <parameter key="4" value="Play.true.binominal.label"/>
            </list>
          </operator>
          <operator activated="true" class="decision_tree" compatibility="5.1.006" expanded="true" height="76" name="Decision Tree" width="90" x="425" y="275"/>
          <connect from_op="Read CSV" from_port="output" to_op="Decision Tree" to_port="training set"/>
          <connect from_op="Decision Tree" from_port="model" to_port="result 1"/>
          <portSpacing port="source_input 1" spacing="0"/>
          <portSpacing port="sink_result 1" spacing="0"/>
          <portSpacing port="sink_result 2" spacing="0"/>
        </process>
      </operator>
    </process>



    i need to dispaly decision tree by reading this xml through java

    Process trainData = new Process(new File("C:\\Documents and Settings\\rzkl07\\RapidMinor5\\golfTrain.xml"));
            IOContainer container = trainData.run();
            System.out.println(container.toString());


    RESULT:IOContainer (1 objects):

    Outlook = overcast: TRUE {FALSE=0, TRUE=4}
    Outlook = rain
    |  Windy = FALSE: TRUE {FALSE=0, TRUE=3}
    |  Windy = TRUE: FALSE {FALSE=2, TRUE=0}
    Outlook = sunny
    |  Humidity > 77.500: FALSE {FALSE=3, TRUE=0}
    |  Humidity ≤ 77.500: TRUE {FALSE=0, TRUE=2}
    (created by Decision Tree)


    but i need to show as graph
    how can i do it from java?
  • IngoRM
    IngoRM New Altair Community Member
    Hi,

    first of all: please don't cross post! I have just removed the same post from three different places, none of them was correct. And please start a new topic in the correct board if you have a new subject - the correct board would have been the one for "Development"...

    Don't send private messages as well - usually I just ignore them and also do not answer corresponding forum posts (if there are any).

    Ok, now the answer: You can use the class "TreeModelGraphRenderer" for this (use the TreeModel as 'Renderable'). Or even better, let RapidMiner create and deliver the visualization itself by using the class "RendererService".

    Cheers,
    Ingo
  • Legacy User
    Legacy User New Altair Community Member
    Hi ingo,

    many thanks for  your reply
    apologies to sent an private msg.

    As you said i have used RendererService

      IOObject io = ioResult.getElementAt(0);
            TreeModel model = (TreeModel) ioResult.getElementAt(0);
              Renderer createRenderer = RendererService.createRenderer(io, "Graph View");
            Component visualizationComponent = createRenderer.getVisualizationComponent(io, ioResult);
            Image createImage = visualizationComponent.createImage(100, 100);
            Graphics graphics = createImage.getGraphics();


    createImage returns null always
    i couldn't able to trigger out ?
    what could be the issue?

    pls bear me

    thanks in advance