"Loops / Iterations"

Armen
Armen New Altair Community Member
edited November 5 in Community Q&A
I consider the loops/iterations one of the most interesting tool of RapidMiner.

I am trying to use ClusterLoop, but something strange happened while using it.

In this example (see code) a ClusterLoop is set on the cluster-attribute: districtName. The ClusterLoop node contains a simple Replace node, which should “creates new attributes from nominal attributes with replaced substrings”: the Replace node is set on the attribute: dataset [“train”, “test”].

After setting the Debug-mode on, I start the process and check the results:

- the loop cycle accordingly to the different clusters, which is great!
- the values “test” are transformed into “TTTesTTT”, as expected
- the values “train” are transformed into “TTTesTTT” too…bug?
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<process version="5.0">
  <context>
    <input/>
    <output/>
    <macros/>
  </context>
  <operator activated="true" class="process" expanded="true" name="Process">
    <process expanded="true" height="521" width="1016">
      <operator activated="true" class="retrieve" expanded="true" height="60" name="TestData2" width="90" x="45" y="165">
        <parameter key="repository_entry" value="data2"/>
      </operator>
      <operator activated="true" class="select_attributes" expanded="true" height="76" name="Select Attributes" width="90" x="179" y="165">
        <parameter key="attribute_filter_type" value="subset"/>
        <parameter key="attributes" value="Year|House_size_tot|House_size|Basement_size|House_ground|DistrictName|Price_2009|ID|dataset"/>
      </operator>
      <operator activated="true" class="set_role" expanded="true" height="76" name="Set Role" width="90" x="313" y="165">
        <parameter key="name" value="DistrictName"/>
        <parameter key="target_role" value="cluster"/>
      </operator>
      <operator activated="true" class="loop_clusters" expanded="true" height="76" name="Loop Clusters" width="90" x="447" y="165">
        <process expanded="true" height="502" width="1059">
          <operator activated="true" class="replace" expanded="true" height="76" name="Replace" width="90" x="447" y="30">
            <parameter key="attribute_filter_type" value="single"/>
            <parameter key="attribute" value="dataset"/>
            <parameter key="replace_what" value="t"/>
            <parameter key="replace_by" value="TTT"/>
          </operator>
          <connect from_port="cluster subset" to_op="Replace" to_port="example set input"/>
          <connect from_op="Replace" from_port="example set output" to_port="out 1"/>
          <portSpacing port="source_cluster subset" spacing="0"/>
          <portSpacing port="source_in 1" spacing="0"/>
          <portSpacing port="sink_out 1" spacing="0"/>
          <portSpacing port="sink_out 2" spacing="0"/>
        </process>
      </operator>
      <connect from_op="TestData2" from_port="output" to_op="Select Attributes" to_port="example set input"/>
      <connect from_op="Select Attributes" from_port="example set output" to_op="Set Role" to_port="example set input"/>
      <connect from_op="Set Role" from_port="example set output" to_op="Loop Clusters" to_port="example set"/>
      <connect from_op="Loop Clusters" from_port="out 1" to_port="result 1"/>
      <portSpacing port="source_input 1" spacing="0"/>
      <portSpacing port="sink_result 1" spacing="0"/>
      <portSpacing port="sink_result 2" spacing="144"/>
    </process>
  </operator>
</process>
In another process I work on each cluster applying a LinaerRegression. I want to see the results of the operation, but I am not able to show them in the &#147;Result page&#148;.
For that reason, I decided to save them using the nodes WritePerformance and WriteModel. In the path I need to use the Parameter Macros. It seems that they do not work (is ralated to this bug? http://bugs.rapid-i.com/show_bug.cgi?id=84#c0
I tried both the old version  %{a}  and the one proposed in the tutorial  %{loop_value} , but they didn&#146;t work!

<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<process version="5.0">
  <context>
    <input/>
    <output/>
    <macros/>
  </context>
  <operator activated="true" class="process" expanded="true" name="Process">
    <process expanded="true" height="521" width="1016">
      <operator activated="true" class="retrieve" expanded="true" height="60" name="TestData2" width="90" x="45" y="165">
        <parameter key="repository_entry" value="data2"/>
      </operator>
      <operator activated="true" class="select_attributes" expanded="true" height="76" name="Select Attributes" width="90" x="179" y="165">
        <parameter key="attribute_filter_type" value="subset"/>
        <parameter key="attributes" value="Year|House_size_tot|House_size|Basement_size|House_ground|DistrictName|Price_2009|ID|dataset"/>
      </operator>
      <operator activated="true" class="set_role" expanded="true" height="76" name="Set Role" width="90" x="313" y="165">
        <parameter key="name" value="DistrictName"/>
        <parameter key="target_role" value="cluster"/>
      </operator>
      <operator activated="true" class="loop_clusters" expanded="true" height="76" name="Loop Clusters" width="90" x="447" y="165">
        <process expanded="true" height="502" width="1016">
          <operator activated="true" class="filter_examples" expanded="true" height="76" name="Filter Examples" width="90" x="45" y="30">
            <parameter key="condition_class" value="attribute_value_filter"/>
            <parameter key="parameter_string" value="dataset=train"/>
          </operator>
          <operator activated="true" class="select_attributes" expanded="true" height="76" name="Select Attributes (2)" width="90" x="313" y="30">
            <parameter key="attribute_filter_type" value="subset"/>
            <parameter key="attributes" value="Year|House_size_tot|House_size|Basement_size|House_ground|DistrictName|Price_2009|ID"/>
          </operator>
          <operator activated="true" class="linear_regression" expanded="true" height="76" name="Linear Regression" width="90" x="447" y="30"/>
          <operator activated="true" class="filter_examples" expanded="true" height="76" name="Filter Examples (3)" width="90" x="179" y="120">
            <parameter key="condition_class" value="attribute_value_filter"/>
            <parameter key="parameter_string" value="dataset=test"/>
          </operator>
          <operator activated="true" class="select_attributes" expanded="true" height="76" name="Select Attributes (3)" width="90" x="313" y="120">
            <parameter key="attribute_filter_type" value="subset"/>
            <parameter key="attributes" value="Year|House_size_tot|House_size|Basement_size|House_ground|DistrictName|Price_2009|ID"/>
          </operator>
          <operator activated="true" class="apply_model" expanded="true" height="76" name="Apply Model" width="90" x="581" y="75">
            <list key="application_parameters"/>
          </operator>
          <operator activated="true" class="performance_regression" expanded="true" height="76" name="Performance" width="90" x="782" y="30">
            <parameter key="root_mean_squared_error" value="true"/>
          </operator>
          <operator activated="true" class="write_performance" expanded="true" height="60" name="Write Performance" width="90" x="916" y="30">
            <parameter key="performance_file" value="C:\IO\loop\performance%{a}.per"/>
          </operator>
          <operator activated="true" class="write_model" expanded="true" height="60" name="Write Model" width="90" x="782" y="120">
            <parameter key="model_file" value="C:\IO\loop\model%{a}.mod"/>
            <parameter key="output_type" value="XML"/>
          </operator>
          <connect from_port="cluster subset" to_op="Filter Examples" to_port="example set input"/>
          <connect from_op="Filter Examples" from_port="example set output" to_op="Select Attributes (2)" to_port="example set input"/>
          <connect from_op="Filter Examples" from_port="original" to_op="Filter Examples (3)" to_port="example set input"/>
          <connect from_op="Select Attributes (2)" from_port="example set output" to_op="Linear Regression" to_port="training set"/>
          <connect from_op="Linear Regression" from_port="model" to_op="Apply Model" to_port="model"/>
          <connect from_op="Filter Examples (3)" from_port="example set output" to_op="Select Attributes (3)" to_port="example set input"/>
          <connect from_op="Select Attributes (3)" from_port="example set output" to_op="Apply Model" to_port="unlabelled data"/>
          <connect from_op="Apply Model" from_port="labelled data" to_op="Performance" to_port="labelled data"/>
          <connect from_op="Apply Model" from_port="model" to_op="Write Model" to_port="input"/>
          <connect from_op="Performance" from_port="performance" to_op="Write Performance" to_port="input"/>
          <portSpacing port="source_cluster subset" spacing="0"/>
          <portSpacing port="source_in 1" spacing="54"/>
          <portSpacing port="sink_out 1" spacing="0"/>
        </process>
      </operator>
      <connect from_op="TestData2" from_port="output" to_op="Select Attributes" to_port="example set input"/>
      <connect from_op="Select Attributes" from_port="example set output" to_op="Set Role" to_port="example set input"/>
      <connect from_op="Set Role" from_port="example set output" to_op="Loop Clusters" to_port="example set"/>
      <portSpacing port="source_input 1" spacing="0"/>
      <portSpacing port="sink_result 1" spacing="0"/>
    </process>
  </operator>
</process>

Answers

  • land
    land New Altair Community Member
    Hi,
    could you try to reproduce this behavior on a generated example set? Then I simply could load your process and see what's going wrong.
    Thank you in advance.

    Greetings,
      Sebastian
  • Armen
    Armen New Altair Community Member
    Here is the data you can use to run these example streams:
    ID;dataset;price;postcode;year;house_size;ground_size
    1;train;4438979;A;1946;179;2759
    2;train;9375083;B;1936;157;1986
    3;test;8111665;C;1928;179;2428
    4;train;8927810;D;1912;102;2872
    5;test;643078;A;1996;102;1179
    6;test;9116892;B;1920;177;1087
    7;test;3343224;C;1918;177;1474
    8;train;1559067;D;1999;141;2835
    9;test;1642734;A;1991;156;1331
    10;test;1143448;B;1998;103;1601
    11;train;6054905;C;1913;95;1324
    12;train;4652272;D;1938;141;1771
    13;test;1007964;A;1987;188;1890
    14;train;9427575;B;1999;115;964
    15;train;3971974;C;1934;136;2155
    16;test;9756122;D;1984;195;1759
    17;test;6982381;A;1979;194;2855
    18;train;5361267;B;1943;100;1740
    19;test;92495;C;1997;93;2580
    20;test;3853701;D;1997;126;2937
    21;test;3672939;A;1934;105;2136
    22;train;8553445;B;1984;110;939
    23;train;2250150;C;1929;173;2851
    24;test;2456895;D;1997;101;2952
    25;test;5726626;A;1945;138;2499
    26;test;9081445;B;1923;187;636
    27;test;2280029;C;1900;187;2812
    28;test;2979554;D;1999;141;2743
    29;test;3102467;A;1978;182;1706
    30;test;6937219;B;1970;91;2273
    31;train;5536152;C;1906;193;2424
    32;train;8850263;D;1927;123;938
    33;train;9247487;A;1920;154;2514
    34;train;1245596;B;1931;90;2081
    35;test;624383;C;1934;142;503
    36;test;5707598;D;1977;192;2749
    37;train;4695273;A;1976;155;1156
    38;train;5992180;B;1958;108;2907
    39;test;4963939;C;1911;198;1493
    40;test;8146548;D;1928;125;2466
    41;test;927543;A;1900;196;1125
    42;test;9800296;B;1925;143;2194
    43;train;5982102;C;1955;158;565
    44;train;8238828;D;1936;196;1127
    45;train;7748803;A;1980;191;539
    46;train;1028282;B;1964;200;705
    47;test;6424228;C;1912;162;2277
    48;train;9347885;D;1980;195;1902
    49;train;5836803;A;1976;184;2611
    50;train;6594397;B;1909;108;1599


    Here is the example code for the Replace problem:
    <?xml version="1.0" encoding="UTF-8" standalone="no"?>
    <process version="5.0">
      <context>
        <input/>
        <output/>
        <macros/>
      </context>
      <operator activated="true" class="process" expanded="true" name="Process">
        <process expanded="true" height="521" width="1016">
          <operator activated="true" breakpoints="after" class="retrieve" expanded="true" height="60" name="TestData2" width="90" x="45" y="165">
            <parameter key="repository_entry" value="//NewLocalRepository/data50"/>
          </operator>
          <operator activated="true" breakpoints="after" class="set_role" expanded="true" height="76" name="Set Role" width="90" x="246" y="165">
            <parameter key="name" value="postcode"/>
            <parameter key="target_role" value="cluster"/>
          </operator>
          <operator activated="true" breakpoints="after" class="loop_clusters" expanded="true" height="76" name="Loop Clusters" width="90" x="447" y="165">
            <process expanded="true" height="502" width="1059">
              <operator activated="true" breakpoints="after" class="replace" expanded="true" height="76" name="Replace" width="90" x="447" y="30">
                <parameter key="attribute_filter_type" value="single"/>
                <parameter key="attribute" value="dataset"/>
                <parameter key="replace_what" value="t"/>
                <parameter key="replace_by" value="TTT"/>
              </operator>
              <connect from_port="cluster subset" to_op="Replace" to_port="example set input"/>
              <connect from_op="Replace" from_port="example set output" to_port="out 1"/>
              <portSpacing port="source_cluster subset" spacing="0"/>
              <portSpacing port="source_in 1" spacing="0"/>
              <portSpacing port="sink_out 1" spacing="0"/>
              <portSpacing port="sink_out 2" spacing="0"/>
            </process>
          </operator>
          <connect from_op="TestData2" from_port="output" to_op="Set Role" to_port="example set input"/>
          <connect from_op="Set Role" from_port="example set output" to_op="Loop Clusters" to_port="example set"/>
          <connect from_op="Loop Clusters" from_port="out 1" to_port="result 1"/>
          <portSpacing port="source_input 1" spacing="0"/>
          <portSpacing port="sink_result 1" spacing="0"/>
          <portSpacing port="sink_result 2" spacing="144"/>
        </process>
      </operator>
    </process>


    Here is the example code for the save problem:
    <?xml version="1.0" encoding="UTF-8" standalone="no"?>
    <process version="5.0">
      <context>
        <input/>
        <output/>
        <macros/>
      </context>
      <operator activated="true" class="process" expanded="true" name="Process">
        <process expanded="true" height="521" width="1016">
          <operator activated="true" class="retrieve" expanded="true" height="60" name="TestData2" width="90" x="45" y="165">
            <parameter key="repository_entry" value="data50"/>
          </operator>
          <operator activated="true" class="set_role" expanded="true" height="76" name="Set Role" width="90" x="246" y="165">
            <parameter key="name" value="postcode"/>
            <parameter key="target_role" value="cluster"/>
          </operator>
          <operator activated="true" class="loop_clusters" expanded="true" height="76" name="Loop Clusters" width="90" x="447" y="165">
            <process expanded="true" height="502" width="1016">
              <operator activated="true" class="filter_examples" expanded="true" height="76" name="Filter Examples" width="90" x="45" y="30">
                <parameter key="condition_class" value="attribute_value_filter"/>
                <parameter key="parameter_string" value="dataset=train"/>
              </operator>
              <operator activated="true" class="select_attributes" expanded="true" height="76" name="Select Attributes (2)" width="90" x="313" y="30">
                <parameter key="attribute_filter_type" value="subset"/>
                <parameter key="attributes" value="ID|year|price|postcode|house_size|ground_size"/>
              </operator>
              <operator activated="true" class="linear_regression" expanded="true" height="76" name="Linear Regression" width="90" x="447" y="30"/>
              <operator activated="true" class="filter_examples" expanded="true" height="76" name="Filter Examples (3)" width="90" x="179" y="120">
                <parameter key="condition_class" value="attribute_value_filter"/>
                <parameter key="parameter_string" value="dataset=test"/>
              </operator>
              <operator activated="true" class="select_attributes" expanded="true" height="76" name="Select Attributes (3)" width="90" x="313" y="120">
                <parameter key="attribute_filter_type" value="subset"/>
                <parameter key="attributes" value="ID|year|price|postcode|house_size|ground_size"/>
              </operator>
              <operator activated="true" class="apply_model" expanded="true" height="76" name="Apply Model" width="90" x="581" y="75">
                <list key="application_parameters"/>
              </operator>
              <operator activated="true" class="performance_regression" expanded="true" height="76" name="Performance" width="90" x="782" y="30">
                <parameter key="root_mean_squared_error" value="true"/>
              </operator>
              <operator activated="true" class="write_performance" expanded="true" height="60" name="Write Performance" width="90" x="916" y="30">
                <parameter key="performance_file" value="C:\IO\loop\performance%{a}.per"/>
              </operator>
              <operator activated="true" class="write_model" expanded="true" height="60" name="Write Model" width="90" x="782" y="120">
                <parameter key="model_file" value="C:\IO\loop\model%{a}.mod"/>
                <parameter key="output_type" value="XML"/>
              </operator>
              <connect from_port="cluster subset" to_op="Filter Examples" to_port="example set input"/>
              <connect from_op="Filter Examples" from_port="example set output" to_op="Select Attributes (2)" to_port="example set input"/>
              <connect from_op="Filter Examples" from_port="original" to_op="Filter Examples (3)" to_port="example set input"/>
              <connect from_op="Select Attributes (2)" from_port="example set output" to_op="Linear Regression" to_port="training set"/>
              <connect from_op="Linear Regression" from_port="model" to_op="Apply Model" to_port="model"/>
              <connect from_op="Filter Examples (3)" from_port="example set output" to_op="Select Attributes (3)" to_port="example set input"/>
              <connect from_op="Select Attributes (3)" from_port="example set output" to_op="Apply Model" to_port="unlabelled data"/>
              <connect from_op="Apply Model" from_port="labelled data" to_op="Performance" to_port="labelled data"/>
              <connect from_op="Apply Model" from_port="model" to_op="Write Model" to_port="input"/>
              <connect from_op="Performance" from_port="performance" to_op="Write Performance" to_port="input"/>
              <portSpacing port="source_cluster subset" spacing="0"/>
              <portSpacing port="source_in 1" spacing="54"/>
              <portSpacing port="sink_out 1" spacing="0"/>
            </process>
          </operator>
          <connect from_op="TestData2" from_port="output" to_op="Set Role" to_port="example set input"/>
          <connect from_op="Set Role" from_port="example set output" to_op="Loop Clusters" to_port="example set"/>
          <portSpacing port="source_input 1" spacing="0"/>
          <portSpacing port="sink_result 1" spacing="0"/>
        </process>
      </operator>
    </process>
  • land
    land New Altair Community Member
    Hi,
    everything works fine for me, so every bug seems to be already fixed in the current version.
    So your problems will be solved with the final version.

    Greetings,
      Sebastian