"Loops / Iterations"
Armen
New Altair Community Member
I consider the loops/iterations one of the most interesting tool of RapidMiner.
I am trying to use ClusterLoop, but something strange happened while using it.
In this example (see code) a ClusterLoop is set on the cluster-attribute: districtName. The ClusterLoop node contains a simple Replace node, which should “creates new attributes from nominal attributes with replaced substrings”: the Replace node is set on the attribute: dataset [“train”, “test”].
After setting the Debug-mode on, I start the process and check the results:
- the loop cycle accordingly to the different clusters, which is great!
- the values “test” are transformed into “TTTesTTT”, as expected
- the values “train” are transformed into “TTTesTTT” too bug?
For that reason, I decided to save them using the nodes WritePerformance and WriteModel. In the path I need to use the Parameter Macros. It seems that they do not work (is ralated to this bug? http://bugs.rapid-i.com/show_bug.cgi?id=84#c0
I tried both the old version %{a} and the one proposed in the tutorial %{loop_value} , but they didn’t work!
I am trying to use ClusterLoop, but something strange happened while using it.
In this example (see code) a ClusterLoop is set on the cluster-attribute: districtName. The ClusterLoop node contains a simple Replace node, which should “creates new attributes from nominal attributes with replaced substrings”: the Replace node is set on the attribute: dataset [“train”, “test”].
After setting the Debug-mode on, I start the process and check the results:
- the loop cycle accordingly to the different clusters, which is great!
- the values “test” are transformed into “TTTesTTT”, as expected
- the values “train” are transformed into “TTTesTTT” too bug?
<?xml version="1.0" encoding="UTF-8" standalone="no"?>In another process I work on each cluster applying a LinaerRegression. I want to see the results of the operation, but I am not able to show them in the “Result page”.
<process version="5.0">
<context>
<input/>
<output/>
<macros/>
</context>
<operator activated="true" class="process" expanded="true" name="Process">
<process expanded="true" height="521" width="1016">
<operator activated="true" class="retrieve" expanded="true" height="60" name="TestData2" width="90" x="45" y="165">
<parameter key="repository_entry" value="data2"/>
</operator>
<operator activated="true" class="select_attributes" expanded="true" height="76" name="Select Attributes" width="90" x="179" y="165">
<parameter key="attribute_filter_type" value="subset"/>
<parameter key="attributes" value="Year|House_size_tot|House_size|Basement_size|House_ground|DistrictName|Price_2009|ID|dataset"/>
</operator>
<operator activated="true" class="set_role" expanded="true" height="76" name="Set Role" width="90" x="313" y="165">
<parameter key="name" value="DistrictName"/>
<parameter key="target_role" value="cluster"/>
</operator>
<operator activated="true" class="loop_clusters" expanded="true" height="76" name="Loop Clusters" width="90" x="447" y="165">
<process expanded="true" height="502" width="1059">
<operator activated="true" class="replace" expanded="true" height="76" name="Replace" width="90" x="447" y="30">
<parameter key="attribute_filter_type" value="single"/>
<parameter key="attribute" value="dataset"/>
<parameter key="replace_what" value="t"/>
<parameter key="replace_by" value="TTT"/>
</operator>
<connect from_port="cluster subset" to_op="Replace" to_port="example set input"/>
<connect from_op="Replace" from_port="example set output" to_port="out 1"/>
<portSpacing port="source_cluster subset" spacing="0"/>
<portSpacing port="source_in 1" spacing="0"/>
<portSpacing port="sink_out 1" spacing="0"/>
<portSpacing port="sink_out 2" spacing="0"/>
</process>
</operator>
<connect from_op="TestData2" from_port="output" to_op="Select Attributes" to_port="example set input"/>
<connect from_op="Select Attributes" from_port="example set output" to_op="Set Role" to_port="example set input"/>
<connect from_op="Set Role" from_port="example set output" to_op="Loop Clusters" to_port="example set"/>
<connect from_op="Loop Clusters" from_port="out 1" to_port="result 1"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="sink_result 1" spacing="0"/>
<portSpacing port="sink_result 2" spacing="144"/>
</process>
</operator>
</process>
For that reason, I decided to save them using the nodes WritePerformance and WriteModel. In the path I need to use the Parameter Macros. It seems that they do not work (is ralated to this bug? http://bugs.rapid-i.com/show_bug.cgi?id=84#c0
I tried both the old version %{a} and the one proposed in the tutorial %{loop_value} , but they didn’t work!
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<process version="5.0">
<context>
<input/>
<output/>
<macros/>
</context>
<operator activated="true" class="process" expanded="true" name="Process">
<process expanded="true" height="521" width="1016">
<operator activated="true" class="retrieve" expanded="true" height="60" name="TestData2" width="90" x="45" y="165">
<parameter key="repository_entry" value="data2"/>
</operator>
<operator activated="true" class="select_attributes" expanded="true" height="76" name="Select Attributes" width="90" x="179" y="165">
<parameter key="attribute_filter_type" value="subset"/>
<parameter key="attributes" value="Year|House_size_tot|House_size|Basement_size|House_ground|DistrictName|Price_2009|ID|dataset"/>
</operator>
<operator activated="true" class="set_role" expanded="true" height="76" name="Set Role" width="90" x="313" y="165">
<parameter key="name" value="DistrictName"/>
<parameter key="target_role" value="cluster"/>
</operator>
<operator activated="true" class="loop_clusters" expanded="true" height="76" name="Loop Clusters" width="90" x="447" y="165">
<process expanded="true" height="502" width="1016">
<operator activated="true" class="filter_examples" expanded="true" height="76" name="Filter Examples" width="90" x="45" y="30">
<parameter key="condition_class" value="attribute_value_filter"/>
<parameter key="parameter_string" value="dataset=train"/>
</operator>
<operator activated="true" class="select_attributes" expanded="true" height="76" name="Select Attributes (2)" width="90" x="313" y="30">
<parameter key="attribute_filter_type" value="subset"/>
<parameter key="attributes" value="Year|House_size_tot|House_size|Basement_size|House_ground|DistrictName|Price_2009|ID"/>
</operator>
<operator activated="true" class="linear_regression" expanded="true" height="76" name="Linear Regression" width="90" x="447" y="30"/>
<operator activated="true" class="filter_examples" expanded="true" height="76" name="Filter Examples (3)" width="90" x="179" y="120">
<parameter key="condition_class" value="attribute_value_filter"/>
<parameter key="parameter_string" value="dataset=test"/>
</operator>
<operator activated="true" class="select_attributes" expanded="true" height="76" name="Select Attributes (3)" width="90" x="313" y="120">
<parameter key="attribute_filter_type" value="subset"/>
<parameter key="attributes" value="Year|House_size_tot|House_size|Basement_size|House_ground|DistrictName|Price_2009|ID"/>
</operator>
<operator activated="true" class="apply_model" expanded="true" height="76" name="Apply Model" width="90" x="581" y="75">
<list key="application_parameters"/>
</operator>
<operator activated="true" class="performance_regression" expanded="true" height="76" name="Performance" width="90" x="782" y="30">
<parameter key="root_mean_squared_error" value="true"/>
</operator>
<operator activated="true" class="write_performance" expanded="true" height="60" name="Write Performance" width="90" x="916" y="30">
<parameter key="performance_file" value="C:\IO\loop\performance%{a}.per"/>
</operator>
<operator activated="true" class="write_model" expanded="true" height="60" name="Write Model" width="90" x="782" y="120">
<parameter key="model_file" value="C:\IO\loop\model%{a}.mod"/>
<parameter key="output_type" value="XML"/>
</operator>
<connect from_port="cluster subset" to_op="Filter Examples" to_port="example set input"/>
<connect from_op="Filter Examples" from_port="example set output" to_op="Select Attributes (2)" to_port="example set input"/>
<connect from_op="Filter Examples" from_port="original" to_op="Filter Examples (3)" to_port="example set input"/>
<connect from_op="Select Attributes (2)" from_port="example set output" to_op="Linear Regression" to_port="training set"/>
<connect from_op="Linear Regression" from_port="model" to_op="Apply Model" to_port="model"/>
<connect from_op="Filter Examples (3)" from_port="example set output" to_op="Select Attributes (3)" to_port="example set input"/>
<connect from_op="Select Attributes (3)" from_port="example set output" to_op="Apply Model" to_port="unlabelled data"/>
<connect from_op="Apply Model" from_port="labelled data" to_op="Performance" to_port="labelled data"/>
<connect from_op="Apply Model" from_port="model" to_op="Write Model" to_port="input"/>
<connect from_op="Performance" from_port="performance" to_op="Write Performance" to_port="input"/>
<portSpacing port="source_cluster subset" spacing="0"/>
<portSpacing port="source_in 1" spacing="54"/>
<portSpacing port="sink_out 1" spacing="0"/>
</process>
</operator>
<connect from_op="TestData2" from_port="output" to_op="Select Attributes" to_port="example set input"/>
<connect from_op="Select Attributes" from_port="example set output" to_op="Set Role" to_port="example set input"/>
<connect from_op="Set Role" from_port="example set output" to_op="Loop Clusters" to_port="example set"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="sink_result 1" spacing="0"/>
</process>
</operator>
</process>
Tagged:
0
Answers
-
Hi,
could you try to reproduce this behavior on a generated example set? Then I simply could load your process and see what's going wrong.
Thank you in advance.
Greetings,
Sebastian0 -
Here is the data you can use to run these example streams:
ID;dataset;price;postcode;year;house_size;ground_size
1;train;4438979;A;1946;179;2759
2;train;9375083;B;1936;157;1986
3;test;8111665;C;1928;179;2428
4;train;8927810;D;1912;102;2872
5;test;643078;A;1996;102;1179
6;test;9116892;B;1920;177;1087
7;test;3343224;C;1918;177;1474
8;train;1559067;D;1999;141;2835
9;test;1642734;A;1991;156;1331
10;test;1143448;B;1998;103;1601
11;train;6054905;C;1913;95;1324
12;train;4652272;D;1938;141;1771
13;test;1007964;A;1987;188;1890
14;train;9427575;B;1999;115;964
15;train;3971974;C;1934;136;2155
16;test;9756122;D;1984;195;1759
17;test;6982381;A;1979;194;2855
18;train;5361267;B;1943;100;1740
19;test;92495;C;1997;93;2580
20;test;3853701;D;1997;126;2937
21;test;3672939;A;1934;105;2136
22;train;8553445;B;1984;110;939
23;train;2250150;C;1929;173;2851
24;test;2456895;D;1997;101;2952
25;test;5726626;A;1945;138;2499
26;test;9081445;B;1923;187;636
27;test;2280029;C;1900;187;2812
28;test;2979554;D;1999;141;2743
29;test;3102467;A;1978;182;1706
30;test;6937219;B;1970;91;2273
31;train;5536152;C;1906;193;2424
32;train;8850263;D;1927;123;938
33;train;9247487;A;1920;154;2514
34;train;1245596;B;1931;90;2081
35;test;624383;C;1934;142;503
36;test;5707598;D;1977;192;2749
37;train;4695273;A;1976;155;1156
38;train;5992180;B;1958;108;2907
39;test;4963939;C;1911;198;1493
40;test;8146548;D;1928;125;2466
41;test;927543;A;1900;196;1125
42;test;9800296;B;1925;143;2194
43;train;5982102;C;1955;158;565
44;train;8238828;D;1936;196;1127
45;train;7748803;A;1980;191;539
46;train;1028282;B;1964;200;705
47;test;6424228;C;1912;162;2277
48;train;9347885;D;1980;195;1902
49;train;5836803;A;1976;184;2611
50;train;6594397;B;1909;108;1599
Here is the example code for the Replace problem:<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<process version="5.0">
<context>
<input/>
<output/>
<macros/>
</context>
<operator activated="true" class="process" expanded="true" name="Process">
<process expanded="true" height="521" width="1016">
<operator activated="true" breakpoints="after" class="retrieve" expanded="true" height="60" name="TestData2" width="90" x="45" y="165">
<parameter key="repository_entry" value="//NewLocalRepository/data50"/>
</operator>
<operator activated="true" breakpoints="after" class="set_role" expanded="true" height="76" name="Set Role" width="90" x="246" y="165">
<parameter key="name" value="postcode"/>
<parameter key="target_role" value="cluster"/>
</operator>
<operator activated="true" breakpoints="after" class="loop_clusters" expanded="true" height="76" name="Loop Clusters" width="90" x="447" y="165">
<process expanded="true" height="502" width="1059">
<operator activated="true" breakpoints="after" class="replace" expanded="true" height="76" name="Replace" width="90" x="447" y="30">
<parameter key="attribute_filter_type" value="single"/>
<parameter key="attribute" value="dataset"/>
<parameter key="replace_what" value="t"/>
<parameter key="replace_by" value="TTT"/>
</operator>
<connect from_port="cluster subset" to_op="Replace" to_port="example set input"/>
<connect from_op="Replace" from_port="example set output" to_port="out 1"/>
<portSpacing port="source_cluster subset" spacing="0"/>
<portSpacing port="source_in 1" spacing="0"/>
<portSpacing port="sink_out 1" spacing="0"/>
<portSpacing port="sink_out 2" spacing="0"/>
</process>
</operator>
<connect from_op="TestData2" from_port="output" to_op="Set Role" to_port="example set input"/>
<connect from_op="Set Role" from_port="example set output" to_op="Loop Clusters" to_port="example set"/>
<connect from_op="Loop Clusters" from_port="out 1" to_port="result 1"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="sink_result 1" spacing="0"/>
<portSpacing port="sink_result 2" spacing="144"/>
</process>
</operator>
</process>
Here is the example code for the save problem:<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<process version="5.0">
<context>
<input/>
<output/>
<macros/>
</context>
<operator activated="true" class="process" expanded="true" name="Process">
<process expanded="true" height="521" width="1016">
<operator activated="true" class="retrieve" expanded="true" height="60" name="TestData2" width="90" x="45" y="165">
<parameter key="repository_entry" value="data50"/>
</operator>
<operator activated="true" class="set_role" expanded="true" height="76" name="Set Role" width="90" x="246" y="165">
<parameter key="name" value="postcode"/>
<parameter key="target_role" value="cluster"/>
</operator>
<operator activated="true" class="loop_clusters" expanded="true" height="76" name="Loop Clusters" width="90" x="447" y="165">
<process expanded="true" height="502" width="1016">
<operator activated="true" class="filter_examples" expanded="true" height="76" name="Filter Examples" width="90" x="45" y="30">
<parameter key="condition_class" value="attribute_value_filter"/>
<parameter key="parameter_string" value="dataset=train"/>
</operator>
<operator activated="true" class="select_attributes" expanded="true" height="76" name="Select Attributes (2)" width="90" x="313" y="30">
<parameter key="attribute_filter_type" value="subset"/>
<parameter key="attributes" value="ID|year|price|postcode|house_size|ground_size"/>
</operator>
<operator activated="true" class="linear_regression" expanded="true" height="76" name="Linear Regression" width="90" x="447" y="30"/>
<operator activated="true" class="filter_examples" expanded="true" height="76" name="Filter Examples (3)" width="90" x="179" y="120">
<parameter key="condition_class" value="attribute_value_filter"/>
<parameter key="parameter_string" value="dataset=test"/>
</operator>
<operator activated="true" class="select_attributes" expanded="true" height="76" name="Select Attributes (3)" width="90" x="313" y="120">
<parameter key="attribute_filter_type" value="subset"/>
<parameter key="attributes" value="ID|year|price|postcode|house_size|ground_size"/>
</operator>
<operator activated="true" class="apply_model" expanded="true" height="76" name="Apply Model" width="90" x="581" y="75">
<list key="application_parameters"/>
</operator>
<operator activated="true" class="performance_regression" expanded="true" height="76" name="Performance" width="90" x="782" y="30">
<parameter key="root_mean_squared_error" value="true"/>
</operator>
<operator activated="true" class="write_performance" expanded="true" height="60" name="Write Performance" width="90" x="916" y="30">
<parameter key="performance_file" value="C:\IO\loop\performance%{a}.per"/>
</operator>
<operator activated="true" class="write_model" expanded="true" height="60" name="Write Model" width="90" x="782" y="120">
<parameter key="model_file" value="C:\IO\loop\model%{a}.mod"/>
<parameter key="output_type" value="XML"/>
</operator>
<connect from_port="cluster subset" to_op="Filter Examples" to_port="example set input"/>
<connect from_op="Filter Examples" from_port="example set output" to_op="Select Attributes (2)" to_port="example set input"/>
<connect from_op="Filter Examples" from_port="original" to_op="Filter Examples (3)" to_port="example set input"/>
<connect from_op="Select Attributes (2)" from_port="example set output" to_op="Linear Regression" to_port="training set"/>
<connect from_op="Linear Regression" from_port="model" to_op="Apply Model" to_port="model"/>
<connect from_op="Filter Examples (3)" from_port="example set output" to_op="Select Attributes (3)" to_port="example set input"/>
<connect from_op="Select Attributes (3)" from_port="example set output" to_op="Apply Model" to_port="unlabelled data"/>
<connect from_op="Apply Model" from_port="labelled data" to_op="Performance" to_port="labelled data"/>
<connect from_op="Apply Model" from_port="model" to_op="Write Model" to_port="input"/>
<connect from_op="Performance" from_port="performance" to_op="Write Performance" to_port="input"/>
<portSpacing port="source_cluster subset" spacing="0"/>
<portSpacing port="source_in 1" spacing="54"/>
<portSpacing port="sink_out 1" spacing="0"/>
</process>
</operator>
<connect from_op="TestData2" from_port="output" to_op="Set Role" to_port="example set input"/>
<connect from_op="Set Role" from_port="example set output" to_op="Loop Clusters" to_port="example set"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="sink_result 1" spacing="0"/>
</process>
</operator>
</process>0 -
Hi,
everything works fine for me, so every bug seems to be already fixed in the current version.
So your problems will be solved with the final version.
Greetings,
Sebastian0