A program to recognize and reward our most engaged community members
<?xml version="1.0" encoding="UTF-8" standalone="no"?><process version="5.0"> <context> <input> <location/> </input> <output> <location/> <location/> <location/> </output> <macros/> </context> <operator activated="true" class="process" expanded="true" name="Process"> <process expanded="true" height="116" width="279"> <operator activated="true" class="generate_data" expanded="true" height="60" name="Generate Data" width="90" x="45" y="30"/> <operator activated="true" class="work_on_subset" expanded="true" height="94" name="Work on Subset" width="90" x="179" y="30"> <parameter key="attribute_filter_type" value="single"/> <parameter key="attribute" value="att1"/> <parameter key="attributes" value="att3|att2"/> <process expanded="true" height="586" width="683"> <operator activated="true" class="agglomerative_clustering" expanded="true" height="76" name="Clustering" width="90" x="45" y="30"/> <connect from_port="exampleSet" to_op="Clustering" to_port="example set"/> <connect from_op="Clustering" from_port="cluster model" to_port="through 1"/> <connect from_op="Clustering" from_port="example set" to_port="example set"/> <portSpacing port="source_exampleSet" spacing="0"/> <portSpacing port="sink_example set" spacing="0"/> <portSpacing port="sink_through 1" spacing="0"/> <portSpacing port="sink_through 2" spacing="0"/> </process> </operator> <connect from_op="Generate Data" from_port="output" to_op="Work on Subset" to_port="example set"/> <connect from_op="Work on Subset" from_port="example set" to_port="result 1"/> <connect from_op="Work on Subset" from_port="through 1" to_port="result 2"/> <portSpacing port="source_input 1" spacing="0"/> <portSpacing port="sink_result 1" spacing="0"/> <portSpacing port="sink_result 2" spacing="0"/> <portSpacing port="sink_result 3" spacing="0"/> </process> </operator></process>
Sebastian Land wrote:Hi,normally all non special attributes are used for calculating the distance. So you have two choices: You could either set all other attributes to be special using the Set Role operator on each of them, or you could simple put the Agglomerative Clustering into a Work on Subset operator, which let's you select the attributes. After the subprocess is executed on the subset, the old attributes are attached to the ExampleSet again.
Sebastian Land wrote:Hi,could you please post me your process? Perhaps there's an error in the meta data transformation, that only occurs under special circumstances.Greetings, Sebastian
"Date","Location","Download","Upload","Latency"05/02/2010 21:39:00,"Date",4070,351,16605/02/2010 21:38:00,"home",3793,352,16405/02/2010 21:38:00,"home",4447,350,16905/02/2010 21:38:00,"home",3595,350,15905/02/2010 21:37:00,"home",3077,327,177005/02/2010 21:37:00,"home",2230,309,25905/02/2010 11:52:00,"downtown",76,117,21905/02/2010 11:52:00,"downtown",163,68,20505/02/2010 11:51:00,"downtown",723,231,18605/02/2010 11:51:00,"downtown",377,0,27004/02/2010 21:50:00,"home",2632,327,16504/02/2010 21:49:00,"home",2803,328,18804/02/2010 21:49:00,"home",1586,329,27604/02/2010 21:48:00,"home",2765,357,21804/02/2010 21:48:00,"home",1634,198,33504/02/2010 11:43:00,"downtown",692,255,23504/02/2010 11:43:00,"downtown",602,113,271704/02/2010 11:42:00,"downtown",775,56,23904/02/2010 11:42:00,"downtown",779,312,814804/02/2010 11:41:00,"downtown",225,43,22104/02/2010 11:41:00,"downtown",471,286,332803/02/2010 21:50:00,"home",1239,276,422903/02/2010 21:49:00,"home",1339,272,226203/02/2010 21:48:00,"home",1600,313,19703/02/2010 21:47:00,"home",2135,313,18703/02/2010 21:47:00,"home",2026,269,27103/02/2010 11:50:00,"downtown",711,266,21003/02/2010 11:50:00,"downtown",152,315,263803/02/2010 11:49:00,"downtown",24,249,30103/02/2010 11:47:00,"downtown",561,291,174003/02/2010 11:47:00,"downtown",863,115,21302/02/2010 21:54:00,"home",1540,351,20002/02/2010 21:54:00,"home",1493,285,20502/02/2010 21:53:00,"home",1606,319,19402/02/2010 21:53:00,"home",1823,319,17402/02/2010 21:53:00,"home",2150,250,25402/02/2010 12:07:00,"downtown",472,273,226602/02/2010 12:07:00,"downtown",387,267,273602/02/2010 12:06:00,"downtown",381,249,28002/02/2010 12:04:00,"downtown",312,195,377502/02/2010 12:03:00,"downtown",863,260,28102/02/2010 12:02:00,"downtown",405,111,21701/02/2010 21:36:00,"home",3326,354,18301/02/2010 21:36:00,"home",3119,326,17201/02/2010 21:35:00,"home",3677,330,16001/02/2010 21:35:00,"home",3151,355,18201/02/2010 21:35:00,"home",3152,314,28201/02/2010 11:58:00,"downtown",1244,316,171601/02/2010 11:58:00,"downtown",1284,312,19201/02/2010 11:58:00,"downtown",1211,319,20601/02/2010 11:57:00,"downtown",900,310,20801/02/2010 11:57:00,"downtown",683,278,5488
<?xml version="1.0" encoding="UTF-8" standalone="no"?><process version="5.0"> <context> <input> <location/> </input> <output> <location/> <location/> <location/> </output> <macros/> </context> <operator activated="true" class="process" expanded="true" name="Process"> <process expanded="true" height="463" width="547"> <operator activated="true" class="read_csv" expanded="true" height="60" name="Read CSV" width="90" x="45" y="30"> <parameter key="file_name" value="C:\work\m\test.csv"/> <parameter key="column_separators" value=","/> <parameter key="date_format" value="yyyy-MM-dd HH:mm:ss.SSS"/> </operator> <operator activated="true" class="work_on_subset" expanded="true" height="76" name="Work on Subset" width="90" x="179" y="30"> <parameter key="attribute_filter_type" value="subset"/> <parameter key="attributes" value="Download"/> <parameter key="keep_subset_only" value="true"/> <process expanded="true"> <connect from_port="exampleSet" to_port="example set"/> <portSpacing port="source_exampleSet" spacing="0"/> <portSpacing port="sink_example set" spacing="0"/> <portSpacing port="sink_through 1" spacing="0"/> </process> </operator> <operator activated="true" class="agglomerative_clustering" expanded="true" height="76" name="Clustering" width="90" x="380" y="30"> <parameter key="measure_types" value="NumericalMeasures"/> </operator> <connect from_op="Read CSV" from_port="output" to_op="Work on Subset" to_port="example set"/> <connect from_op="Work on Subset" from_port="example set" to_op="Clustering" to_port="example set"/> <connect from_op="Clustering" from_port="cluster model" to_port="result 1"/> <connect from_op="Clustering" from_port="example set" to_port="result 2"/> <portSpacing port="source_input 1" spacing="0"/> <portSpacing port="sink_result 1" spacing="0"/> <portSpacing port="sink_result 2" spacing="0"/> <portSpacing port="sink_result 3" spacing="0"/> </process> </operator></process>
Sebastian Land wrote:Hi,the id attribute is automatically added by the clustering algorithm. This is needed to assign an example to an cluster. Hierarchical cluster models always contain 2n -1 entries, because they start with each example being one cluster and then merge two clusters each step. This is performed until only one cluster remains.
This hierarchy might be flatted using the Flatten Clustering operator, which will let the choice, how many clusters you are want to have. If you need it, we could discuss how to add an option for flatten depending on the maximal allowed distance instead of the numbers.