"(solved) Clustering and classify unlabelled dataset"
blueearth
New Altair Community Member
Hi all.
I have an example set without any special attributes ...is it possible to run unsupervised clustering or classification on it in order to cluster or classify these data?
for example i have set of regular attributes and i want a model to cluster or classify them with regards to regular attributes...is there any operator or processes for this purpose
Thank you.
I have an example set without any special attributes ...is it possible to run unsupervised clustering or classification on it in order to cluster or classify these data?
for example i have set of regular attributes and i want a model to cluster or classify them with regards to regular attributes...is there any operator or processes for this purpose
Thank you.
Tagged:
0
Answers
-
Hello
Yes indeed - all the clustering algorithms can do this.
Here's an example using k-means. For fun, it also joins the cluster result back to the original and maps clusters to labels to come up with a classification performance.
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<process version="5.2.008">
<context>
<input/>
<output/>
<macros/>
</context>
<operator activated="true" class="process" compatibility="5.2.008" expanded="true" name="Process">
<process expanded="true" height="431" width="1016">
<operator activated="true" class="retrieve" compatibility="5.2.008" expanded="true" height="60" name="Retrieve" width="90" x="45" y="30">
<parameter key="repository_entry" value="//Samples/data/Iris"/>
</operator>
<operator activated="true" class="select_attributes" compatibility="5.2.008" expanded="true" height="76" name="Select Attributes" width="90" x="179" y="165">
<parameter key="attribute_filter_type" value="subset"/>
<parameter key="attributes" value="|a4|a3|a2|a1"/>
<parameter key="include_special_attributes" value="true"/>
</operator>
<operator activated="true" class="k_means" compatibility="5.2.008" expanded="true" height="76" name="Clustering" width="90" x="313" y="30">
<parameter key="k" value="3"/>
<parameter key="measure_types" value="NumericalMeasures"/>
<parameter key="numerical_measure" value="CosineSimilarity"/>
</operator>
<operator activated="true" class="replace" compatibility="5.2.008" expanded="true" height="76" name="Replace" width="90" x="313" y="300">
<parameter key="attribute_filter_type" value="single"/>
<parameter key="attribute" value="id"/>
<parameter key="include_special_attributes" value="true"/>
<parameter key="replace_what" value="id_(.*)"/>
<parameter key="replace_by" value="$1"/>
</operator>
<operator activated="true" class="guess_types" compatibility="5.2.008" expanded="true" height="76" name="Guess Types" width="90" x="447" y="300">
<parameter key="attribute_filter_type" value="single"/>
<parameter key="attribute" value="id"/>
<parameter key="include_special_attributes" value="true"/>
</operator>
<operator activated="true" class="guess_types" compatibility="5.2.008" expanded="true" height="76" name="Guess Types (2)" width="90" x="447" y="165">
<parameter key="attribute_filter_type" value="single"/>
<parameter key="attribute" value="id"/>
<parameter key="include_special_attributes" value="true"/>
</operator>
<operator activated="true" class="join" compatibility="5.2.008" expanded="true" height="76" name="Join" width="90" x="581" y="120">
<list key="key_attributes"/>
</operator>
<operator activated="true" class="map_clustering_on_labels" compatibility="5.2.008" expanded="true" height="76" name="Map Clustering on Labels" width="90" x="715" y="30"/>
<operator activated="true" class="performance" compatibility="5.2.008" expanded="true" height="76" name="Performance" width="90" x="849" y="30"/>
<connect from_op="Retrieve" from_port="output" to_op="Select Attributes" to_port="example set input"/>
<connect from_op="Select Attributes" from_port="example set output" to_op="Clustering" to_port="example set"/>
<connect from_op="Select Attributes" from_port="original" to_op="Replace" to_port="example set input"/>
<connect from_op="Clustering" from_port="cluster model" to_op="Map Clustering on Labels" to_port="cluster model"/>
<connect from_op="Clustering" from_port="clustered set" to_op="Guess Types (2)" to_port="example set input"/>
<connect from_op="Replace" from_port="example set output" to_op="Guess Types" to_port="example set input"/>
<connect from_op="Guess Types" from_port="example set output" to_op="Join" to_port="right"/>
<connect from_op="Guess Types (2)" from_port="example set output" to_op="Join" to_port="left"/>
<connect from_op="Guess Types (2)" from_port="original" to_port="result 2"/>
<connect from_op="Join" from_port="join" to_op="Map Clustering on Labels" to_port="example set"/>
<connect from_op="Map Clustering on Labels" from_port="example set" to_op="Performance" to_port="labelled data"/>
<connect from_op="Performance" from_port="performance" to_port="result 1"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="sink_result 1" spacing="0"/>
<portSpacing port="sink_result 2" spacing="0"/>
<portSpacing port="sink_result 3" spacing="0"/>
</process>
</operator>
</process>
regards
Andrew0 -
Hi thank you so much
but unfortunately i didn't get it
here we have spacial attributes such as label and id in that example but what i have is an example set with out any special attributes and id its all just regular attributes and i want to know is it possible to cluster or classify them according to regular attributes?
thanks alot0 -
Hello
Select the Clustering operator and set a breakpoint before it executes and one after.
If you run the process you will see that the input to the operator is an example set consisting of 4 regular attributes whilst the output has an id and a cluster attribute added.
regards
Andrew
0 -
Thank you so much0