"Clustering with Loops?"
Prably
New Altair Community Member
Hi RM Masters!
I am a novice with RM and inexperienced with loops and macros. I need advice on how to structure a process to loop clustering. I am trying to get three centroids - a low / medium / high for each location and illness combination (see below). This will be used so when future data is received about how long a contract from [location B] for [pain] is taking I can tell whether it is taking too long, on track, or ahead of schedule.
I'm pretty sure I want to run clustering (k-means) with looping for all unique combinations of the attributes Location and Illness. So I want to get 3 centriods for [Location A & Ebola] subset, three centroids for [Location B & Cold], [Location C & Cold], etc. The attributes Milestone 1, Milestone 2, Milestone Final are the numerical attributes I want to use for my clustering.
My data set is about 13,000 examples and I have some other polynomial attributes that aren't listed here.
Please forgive the formatting; here is a representative sample of the example set:
Contract ID Location Illness Contract Status Contract Type Begin Date Milestone 1 Milestone 2 Milestone Final
1 A Ebola Finished Big 1/10/2013 78 133 154
2 A Aids Unfinished Small 1/5/2009 1 125 162
3 A Cold Finished Big 8/17/2012 40 118 214
7 B Awesomeness Finished Small 9/27/2007 42 150 209
8 C Upset Stomach Unfinished Small 12/20/2009 10 101 219
9 D Ebola Finished Big 1/16/2009 9 111 246
10 D Headache Unfinished Big 9/11/2005 57 127 238
11 D Club Foot Unfinished Small 12/2/2005 55 141 204
12 D Aids Finished Small 2/3/2012 15 106 191
13 D Upset Stomach Finished Small 11/27/2009 48 103 194
14 D Ebola Finished Big 5/18/2005 86 101 160
15 D Ebola Finished Big 11/15/2009 7 148 164
16 D Pain Unfinished Small 5/25/2005 29 117 242
18 D Club foot Unfinished Big 4/28/2011 41 147 190
19 D Club foot Unfinished Small 4/20/2007 48 113 229
Also, any thoughts on how to learn to work with loops macros would be wonderful.
Thanks in advance for the advice!
I am a novice with RM and inexperienced with loops and macros. I need advice on how to structure a process to loop clustering. I am trying to get three centroids - a low / medium / high for each location and illness combination (see below). This will be used so when future data is received about how long a contract from [location B] for [pain] is taking I can tell whether it is taking too long, on track, or ahead of schedule.
I'm pretty sure I want to run clustering (k-means) with looping for all unique combinations of the attributes Location and Illness. So I want to get 3 centriods for [Location A & Ebola] subset, three centroids for [Location B & Cold], [Location C & Cold], etc. The attributes Milestone 1, Milestone 2, Milestone Final are the numerical attributes I want to use for my clustering.
My data set is about 13,000 examples and I have some other polynomial attributes that aren't listed here.
Please forgive the formatting; here is a representative sample of the example set:
Contract ID Location Illness Contract Status Contract Type Begin Date Milestone 1 Milestone 2 Milestone Final
1 A Ebola Finished Big 1/10/2013 78 133 154
2 A Aids Unfinished Small 1/5/2009 1 125 162
3 A Cold Finished Big 8/17/2012 40 118 214
7 B Awesomeness Finished Small 9/27/2007 42 150 209
8 C Upset Stomach Unfinished Small 12/20/2009 10 101 219
9 D Ebola Finished Big 1/16/2009 9 111 246
10 D Headache Unfinished Big 9/11/2005 57 127 238
11 D Club Foot Unfinished Small 12/2/2005 55 141 204
12 D Aids Finished Small 2/3/2012 15 106 191
13 D Upset Stomach Finished Small 11/27/2009 48 103 194
14 D Ebola Finished Big 5/18/2005 86 101 160
15 D Ebola Finished Big 11/15/2009 7 148 164
16 D Pain Unfinished Small 5/25/2005 29 117 242
18 D Club foot Unfinished Big 4/28/2011 41 147 190
19 D Club foot Unfinished Small 4/20/2007 48 113 229
Also, any thoughts on how to learn to work with loops macros would be wonderful.
Thanks in advance for the advice!
Tagged:
0
Answers
-
This might help. First, use the Generate Concatenation operator to create a new field that concatenates Location and Illness. Then, feed that into a Loop Values operator. When you're in the Subprocess for the loop, you will want to filter based on your new concatenated attribute. The trick being, you will want to use a %{loop_value} -- that is, Location_Illness=%{loop_value}. Then, just continue from there. Hope this helps.
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<process version="5.3.000">
<context>
<input/>
<output/>
<macros/>
</context>
<operator activated="true" class="process" compatibility="5.3.000" expanded="true" name="Process">
<process expanded="true" height="460" width="547">
<operator activated="true" class="generate_concatenation" compatibility="5.3.000" expanded="true" height="76" name="Generate Concatenation" width="90" x="179" y="165">
<parameter key="first_attribute" value="Illness"/>
<parameter key="second_attribute" value="Location"/>
</operator>
<operator activated="true" class="loop_values" compatibility="5.3.000" expanded="true" height="76" name="Loop Values" width="90" x="313" y="165">
<parameter key="attribute" value="Illness_Location"/>
<process expanded="true" height="663" width="887">
<operator activated="true" class="filter_examples" compatibility="5.3.000" expanded="true" height="76" name="Filter Examples" width="90" x="179" y="30">
<parameter key="condition_class" value="attribute_value_filter"/>
<parameter key="parameter_string" value="Illness_Location=%{loop_value}"/>
</operator>
<connect from_port="example set" to_op="Filter Examples" to_port="example set input"/>
<connect from_op="Filter Examples" from_port="example set output" to_port="out 1"/>
<portSpacing port="source_example set" spacing="0"/>
<portSpacing port="sink_out 1" spacing="0"/>
<portSpacing port="sink_out 2" spacing="0"/>
</process>
</operator>
<connect from_op="Generate Concatenation" from_port="example set output" to_op="Loop Values" to_port="example set"/>
<connect from_op="Loop Values" from_port="out 1" to_port="result 1"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="sink_result 1" spacing="0"/>
<portSpacing port="sink_result 2" spacing="0"/>
</process>
</operator>
</process>0