"Saving detail results (folder view) of unsupervised clustering"
I've been running unsupervised learning processes and obtaining results. I would like to save the detail results that show the cluster number and individual items in that cluster (folder view) so I can check the results quickly. When I use ResultWriter it only writes the Text View (summary) to a file.
Is there another operator or a cluster parameter to set that will save the folder view to a file?
Or do I have to save the cluster model then run supervised classification to see the detail results?
Thanks.
Is there another operator or a cluster parameter to set that will save the folder view to a file?
Or do I have to save the cluster model then run supervised classification to see the detail results?
Thanks.
Find more posts tagged with
Sort by:
1 - 11 of
111
Hi,
if you want to show the folder view from your own application (?) maybe reloading it and just using the available cluster model visualization might be an option. This is simply one line for a loaded ClusterModel "cm":
Cheers,
Ingo
if you want to show the folder view from your own application (?) maybe reloading it and just using the available cluster model visualization might be an option. This is simply one line for a loaded ClusterModel "cm":
Maybe this helps (in case you want to display it in another application).
Component folderView = new ExtendedJScrollPane(new ClusterTreeVisualization((FlatClusterModel) cm));
Cheers,
Ingo
Tobias, Ingo
While looking for something else I discovered Cluster2ExampleSet. This does what I need, and then I can write the example set to a database, Excel, etc.
However, the row/record ID and cluster number are written at the end of each example line, and when I save to Excel the cluster info is dropped. There is no switch on Exampl2Cluster or ExampleWriter to place the record ID and cluster number at the beginning of the line. (Over 1800 terms in my small test set, which converts to 1800+ columns to save when the data is written.)
Is it possible to add an ExampleWriter parameter to place the important information at the beginning of the line? Then I can import directly into my database.
For now I can write a parser to read the end of the line in a csv file and extract what I need.
Thanks. (I am sure you guys are being hammered with extra requests from the new release.)
B.
While looking for something else I discovered Cluster2ExampleSet. This does what I need, and then I can write the example set to a database, Excel, etc.
However, the row/record ID and cluster number are written at the end of each example line, and when I save to Excel the cluster info is dropped. There is no switch on Exampl2Cluster or ExampleWriter to place the record ID and cluster number at the beginning of the line. (Over 1800 terms in my small test set, which converts to 1800+ columns to save when the data is written.)
Is it possible to add an ExampleWriter parameter to place the important information at the beginning of the line? Then I can import directly into my database.
For now I can write a parser to read the end of the line in a csv file and extract what I need.
Thanks. (I am sure you guys are being hammered with extra requests from the new release.)
B.
Hi,
well, this is actually already possible. Just use the ExampleSetWriter operator and set the "format" parameter to "special_format" and define an appropriate "special_format" parameter. The documentation (tutorial and the tooltip inside the program or F1) will show you the possible settings for the special format parameter. However, here is the solution for the three suggested scenarios:
Cheers,
Ingo
well, this is actually already possible. Just use the ExampleSetWriter operator and set the "format" parameter to "special_format" and define an appropriate "special_format" parameter. The documentation (tutorial and the tooltip inside the program or F1) will show you the possible settings for the special format parameter. However, here is the solution for the three suggested scenarios:
Here you can just use the "dense" format:
0- as it is now
<operator name="Root" class="Process" expanded="yes">
<operator name="ExampleSetGenerator" class="ExampleSetGenerator">
<parameter key="number_of_attributes" value="2"/>
<parameter key="target_function" value="gaussian mixture clusters"/>
</operator>
<operator name="IdTagging" class="IdTagging">
</operator>
<operator name="FeatureNameFilter" class="FeatureNameFilter">
<parameter key="filter_special_features" value="true"/>
<parameter key="skip_features_with_name" value="label"/>
</operator>
<operator name="KMeans" class="KMeans">
<parameter key="k" value="4"/>
</operator>
<operator name="ClusterModel2ExampleSet" class="ClusterModel2ExampleSet">
<parameter key="keep_cluster_model" value="false"/>
</operator>
<operator name="ExampleSetWriter" class="ExampleSetWriter">
<parameter key="example_set_file" value="cm_out_complete.dat"/>
</operator>
</operator>
Here you can use the special format "$i $v[cluster] $a":
1- place rec ID and cluster number at the beginning of the line and attributes after
<operator name="Root" class="Process" expanded="yes">
<operator name="ExampleSetGenerator" class="ExampleSetGenerator">
<parameter key="number_of_attributes" value="2"/>
<parameter key="target_function" value="gaussian mixture clusters"/>
</operator>
<operator name="IdTagging" class="IdTagging">
</operator>
<operator name="FeatureNameFilter" class="FeatureNameFilter">
<parameter key="filter_special_features" value="true"/>
<parameter key="skip_features_with_name" value="label"/>
</operator>
<operator name="KMeans" class="KMeans">
<parameter key="k" value="4"/>
</operator>
<operator name="ClusterModel2ExampleSet" class="ClusterModel2ExampleSet">
<parameter key="keep_cluster_model" value="false"/>
</operator>
<operator name="ExampleSetWriter" class="ExampleSetWriter">
<parameter key="example_set_file" value="cm_out_complete.dat"/>
<parameter key="format" value="special_format"/>
<parameter key="special_format" value="$i $v[cluster] $a"/>
</operator>
</operator>
Here you can just omit the parameter "$a" above like in this example:
2 - output only rec ID and cluster number, leave off cluster attributes
<operator name="Root" class="Process" expanded="yes">
<operator name="ExampleSetGenerator" class="ExampleSetGenerator">
<parameter key="number_of_attributes" value="2"/>
<parameter key="target_function" value="gaussian mixture clusters"/>
</operator>
<operator name="IdTagging" class="IdTagging">
</operator>
<operator name="FeatureNameFilter" class="FeatureNameFilter">
<parameter key="filter_special_features" value="true"/>
<parameter key="skip_features_with_name" value="label"/>
</operator>
<operator name="KMeans" class="KMeans">
<parameter key="k" value="4"/>
</operator>
<operator name="ClusterModel2ExampleSet" class="ClusterModel2ExampleSet">
<parameter key="keep_cluster_model" value="false"/>
</operator>
<operator name="ExampleSetWriter" class="ExampleSetWriter">
<parameter key="example_set_file" value="cm_out_complete.dat"/>
<parameter key="format" value="special_format"/>
<parameter key="special_format" value="$i $v[cluster]"/>
</operator>
</operator>
Cheers,
Ingo
I had more time to play around with the tool now. Seems it might have to do with Weka -unsupervised learning maybe, i didn't get the cluster word with kmeans or support vector clustering...
<operator name="Root" class="Process" expanded="yes">
<operator name="ExampleSource" class="ExampleSource">
<parameter key="attributes" value="C:\Documents and Settings\shahv\My Documents\rm_workspace\temp.aml"/>
</operator>
<operator name="W-EM" class="W-EM">
<parameter key="N" value="5.0"/>
</operator>
</operator>
Results aml file
<?xml version="1.0" encoding="windows-1252"?>
<attributeset default_source="..\..\Program Files (x86)\Rapid-I\RapidMiner-4.1\temp.dat">
<attribute
name = "feature_id.dat (2)"
sourcecol = "1"
valuetype = "integer"/>
<attribute
name = "feature_id.dat (3)"
sourcecol = "2"
valuetype = "integer"/>
<attribute
name = "feature_id.dat (4)"
sourcecol = "3"
valuetype = "integer"/>
<attribute
name = "feature_id.dat (5)"
sourcecol = "4"
valuetype = "integer"/>
<id
name = "feature_id.dat (1)"
sourcecol = "5"
valuetype = "integer"/>
<cluster
name = "cluster"
sourcecol = "6"
valuetype = "nominal">
<value>cluster4</value>
<value>cluster0</value>
<value>cluster3</value>
<value>cluster2</value>
<value>cluster1</value>
</cluster>
</attributeset>
[attachment deleted by admin]
<operator name="Root" class="Process" expanded="yes">
<operator name="ExampleSource" class="ExampleSource">
<parameter key="attributes" value="C:\Documents and Settings\shahv\My Documents\rm_workspace\temp.aml"/>
</operator>
<operator name="W-EM" class="W-EM">
<parameter key="N" value="5.0"/>
</operator>
</operator>
Results aml file
<?xml version="1.0" encoding="windows-1252"?>
<attributeset default_source="..\..\Program Files (x86)\Rapid-I\RapidMiner-4.1\temp.dat">
<attribute
name = "feature_id.dat (2)"
sourcecol = "1"
valuetype = "integer"/>
<attribute
name = "feature_id.dat (3)"
sourcecol = "2"
valuetype = "integer"/>
<attribute
name = "feature_id.dat (4)"
sourcecol = "3"
valuetype = "integer"/>
<attribute
name = "feature_id.dat (5)"
sourcecol = "4"
valuetype = "integer"/>
<id
name = "feature_id.dat (1)"
sourcecol = "5"
valuetype = "integer"/>
<cluster
name = "cluster"
sourcecol = "6"
valuetype = "nominal">
<value>cluster4</value>
<value>cluster0</value>
<value>cluster3</value>
<value>cluster2</value>
<value>cluster1</value>
</cluster>
</attributeset>
[attachment deleted by admin]
as you already noticed, a ResultWriter only saves the textual represenations of the objects that are output of a process. The operator you are looking for is called IOObjectWriter which lets you save any IOObject, in particular also a ClusterModel. Hence, you have to chose ClusterModel for the parameter "io_object" in order to save your ClusterModel. You may the load that ClusterModel back into RapidMiner by using the corresponding IOObjectReader which is in the same operator group (IO.Other).
Hope that solves your problem.
Regards,
Tobias