X-Val (Parallel Version)

jlo
jlo New Altair Community Member
edited November 5 in Community Q&A
I'm trying to use the X-validation (Parallel) operator but I'm getting errors even for very simple processes. I've run the program in 3 different machines (2-core & 4-core) with the same result. I get a "Process Failed. Send bug report?" dialog.

Am I forgetting something obvious and embarrasing?

I don't want submit a bug if I'm the one making a silly mistake.

Thanks for any help,

jlo

Code below:
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<process version="5.1.001">
 <context>
   <input/>
   <output/>
   <macros/>
 </context>
 <operator activated="true" class="process" compatibility="5.1.001" expanded="true" name="Process">
   <process expanded="true" height="197" width="413">
     <operator activated="true" class="generate_direct_mailing_data" compatibility="5.1.001" expanded="true" height="60" name="Generate Direct Mailing Data" width="90" x="112" y="63"/>
     <operator activated="true" class="parallel:x_validation_parallel" compatibility="5.0.001" expanded="true" height="112" name="Validation" width="90" x="313" y="75">
       <process expanded="true">
         <operator activated="true" class="decision_tree" compatibility="5.0.000" expanded="true" height="76" name="Decision Tree" width="90" x="45" y="30"/>
         <connect from_port="training" to_op="Decision Tree" to_port="training set"/>
         <connect from_op="Decision Tree" from_port="model" to_port="model"/>
         <portSpacing port="source_training" spacing="0"/>
         <portSpacing port="sink_model" spacing="0"/>
         <portSpacing port="sink_through 1" spacing="0"/>
       </process>
       <process expanded="true">
         <operator activated="true" class="apply_model" compatibility="5.0.000" expanded="true" height="76" name="Apply Model" width="90" x="45" y="30">
           <list key="application_parameters"/>
         </operator>
         <operator activated="true" class="performance" compatibility="5.0.000" expanded="true" height="76" name="Performance" width="90" x="179" y="30"/>
         <connect from_port="model" to_op="Apply Model" to_port="model"/>
         <connect from_port="test set" to_op="Apply Model" to_port="unlabelled data"/>
         <connect from_op="Apply Model" from_port="labelled data" to_op="Performance" to_port="labelled data"/>
         <connect from_op="Performance" from_port="performance" to_port="averagable 1"/>
         <portSpacing port="source_model" spacing="0"/>
         <portSpacing port="source_test set" spacing="0"/>
         <portSpacing port="source_through 1" spacing="0"/>
         <portSpacing port="sink_averagable 1" spacing="0"/>
         <portSpacing port="sink_averagable 2" spacing="0"/>
       </process>
     </operator>
     <connect from_op="Generate Direct Mailing Data" from_port="output" to_op="Validation" to_port="training"/>
     <connect from_op="Validation" from_port="model" to_port="result 1"/>
     <connect from_op="Validation" from_port="training" to_port="result 2"/>
     <portSpacing port="source_input 1" spacing="0"/>
     <portSpacing port="sink_result 1" spacing="0"/>
     <portSpacing port="sink_result 2" spacing="0"/>
     <portSpacing port="sink_result 3" spacing="0"/>
   </process>
 </operator>
</process>
Tagged:

Answers

  • IngoRM
    IngoRM New Altair Community Member
    Hi jlo,

    no, this indeed seems to be a bug, thanks for pointing it out. By the way, the problem is not the X-Validation itself but the model which is learned at the end on the complete data set. The following process can be used as a workaround until the bug gets fixed:

    <?xml version="1.0" encoding="UTF-8" standalone="no"?>
    <process version="5.1.001">
     <context>
       <input/>
       <output/>
       <macros/>
     </context>
     <operator activated="true" class="process" compatibility="5.1.001" expanded="true" name="Process">
       <process expanded="true" height="197" width="547">
         <operator activated="true" class="generate_direct_mailing_data" compatibility="5.1.001" expanded="true" height="60" name="Generate Direct Mailing Data" width="90" x="45" y="75"/>
         <operator activated="true" class="parallel:x_validation_parallel" compatibility="5.0.001" expanded="true" height="112" name="Validation" width="90" x="179" y="75">
           <parameter key="parallelize_training" value="true"/>
           <process expanded="true" height="762" width="564">
             <operator activated="true" class="decision_tree" compatibility="5.1.001" expanded="true" height="76" name="Decision Tree" width="90" x="45" y="30"/>
             <connect from_port="training" to_op="Decision Tree" to_port="training set"/>
             <connect from_op="Decision Tree" from_port="model" to_port="model"/>
             <portSpacing port="source_training" spacing="0"/>
             <portSpacing port="sink_model" spacing="0"/>
             <portSpacing port="sink_through 1" spacing="0"/>
           </process>
           <process expanded="true" height="762" width="564">
             <operator activated="true" class="apply_model" compatibility="5.1.001" expanded="true" height="76" name="Apply Model" width="90" x="45" y="30">
               <list key="application_parameters"/>
             </operator>
             <operator activated="true" class="performance" compatibility="5.1.001" expanded="true" height="76" name="Performance" width="90" x="179" y="30"/>
             <connect from_port="model" to_op="Apply Model" to_port="model"/>
             <connect from_port="test set" to_op="Apply Model" to_port="unlabelled data"/>
             <connect from_op="Apply Model" from_port="labelled data" to_op="Performance" to_port="labelled data"/>
             <connect from_op="Performance" from_port="performance" to_port="averagable 1"/>
             <portSpacing port="source_model" spacing="0"/>
             <portSpacing port="source_test set" spacing="0"/>
             <portSpacing port="source_through 1" spacing="0"/>
             <portSpacing port="sink_averagable 1" spacing="0"/>
             <portSpacing port="sink_averagable 2" spacing="0"/>
           </process>
         </operator>
         <operator activated="true" class="decision_tree" compatibility="5.1.001" expanded="true" height="76" name="Decision Tree (2)" width="90" x="313" y="30"/>
         <connect from_op="Generate Direct Mailing Data" from_port="output" to_op="Validation" to_port="training"/>
         <connect from_op="Validation" from_port="training" to_op="Decision Tree (2)" to_port="training set"/>
         <connect from_op="Validation" from_port="averagable 1" to_port="result 2"/>
         <connect from_op="Decision Tree (2)" from_port="model" to_port="result 1"/>
         <portSpacing port="source_input 1" spacing="0"/>
         <portSpacing port="sink_result 1" spacing="0"/>
         <portSpacing port="sink_result 2" spacing="72"/>
         <portSpacing port="sink_result 3" spacing="0"/>
       </process>
     </operator>
    </process>
    I have added the bug to the bug tracker:

    http://bugs.rapid-i.com/show_bug.cgi?id=460

    If you are an Enterprise Customer, please contact us and we will handle the error with high priority.

    Cheers and thanks again,
    Ingo
  • jlo
    jlo New Altair Community Member
    Hi Ingo:

    Thanks a million for your help. Your clever workaround is good enough for me.

    I have a quick question: Is there a page in the RM site where one can check what bugs have been fixed or what's new about a new release (5.0 vs 5.1). [ I'm aware of the existence of bugs.rapid-i.com ]

    Sometimes new features become available and one is not aware of them until one asks a question here and somebody says "well that has been available since version X.x).

    Thanks again,

    jlo

  • IngoRM
    IngoRM New Altair Community Member
    Hi jlo,

    I have a quick question: Is there a page in the RM site where one can check what bugs have been fixed or what's new about a new release (5.0 vs 5.1).
    each release comes with a file called "CHANGES.txt" where at least the major changes and fixes should be listed. I have to admit that we, aehem, forgot to maintain this file during the last 12 months but in the future we again try to think of each cool new feature and write it down there.

    Cheers,
    Ingo

    P.S.: Below are the ones for the latest release:

    Changes from RapidMiner 5.0 to 5.1
    ----------------------------------

    * Added RapidAnalytics connectivity
    * Added new repository type that reflects database connections
    * Added type-specific icons to repository tree
    * Added annotations to IOObjects
    * Import operators and wizards remake
    * Most wanted feature: "Rename" and "Set Role" can handle multiple attributes at a time
    * Versioned operators allow easier updates
    * "Generate Attributes" has new UI and supports more text and date functions
    * Operator documentation uses Wiki (http://rapid-i.com/wiki/).
    * IOObjects can be annotated, e.g. with file source or SQL statement
    * Added new Operators:
      - Print to Console
      - Unset Macro
      - "Auto MLP" and "k-Means (fast)" contributed by DFKI
      - Hierarchical Classification
      - Numerical to Date
      - Delay
    * Database operators can prepare statements
    * Revised import wizards
    * Background tasks stoppable
    * Added process profiling and resource consumption annotations
    * Added Support for R Extension
    * Added new boolean GUI property rapidminer.gui.fetch_data_base_table_names which suppresses to fetch data base table names in the SQLQueryBuilder
    * More efficient meta data handling for Excel, CSV, and database readers
    * Meta data propagation uses context macros
    * Splash screen shows plugins
    * Aggregate operator can compute product
    * Various smaller fixes
    * Various UI improvements
     
    Major Bugfixes:
    * Fixed memory leak causing RapidMiner to run out of memory if processed many and large example sets
    * Readded descriptive error messages 
  • land
    land New Altair Community Member
    Hi,
    this bug has been fixed.

    Greetings,
      Sebastian