"[Solved]Convert Numeric to Nominal after k-means clustering"
nachiket
New Altair Community Member
I am a new RapidMiner, I have an excel dataset
I wanted to apply k-means clustering on this dataset and then Bayesian classification on the result of the same
I imported excel(all fields except FID as text) and did Nominal to Numeric to apply kmeans now I want the clusters with original values of data as in input excel (not the numeric data) to apply Bayes classification on same
How can I do Numeric to Nominal conversion on all of fields ?
Sample Data(1100 rows)
FID Geology Geomorphology Land use_land cover Rainfall SLOPE Soil zone
0 Fissile hornblende biotite gneiss HIGHLY DISSECTED DIFLECTION SLOPE FOREST 1200-1400 >60% BROWN CLAY High
1 Fissile hornblende biotite gneiss HIGHLY DISSECTED DIFLECTION SLOPE FOREST 1200-1400 30-60% BROWN CLAY Moderate
I wanted to apply k-means clustering on this dataset and then Bayesian classification on the result of the same
I imported excel(all fields except FID as text) and did Nominal to Numeric to apply kmeans now I want the clusters with original values of data as in input excel (not the numeric data) to apply Bayes classification on same
How can I do Numeric to Nominal conversion on all of fields ?
Sample Data(1100 rows)
FID Geology Geomorphology Land use_land cover Rainfall SLOPE Soil zone
0 Fissile hornblende biotite gneiss HIGHLY DISSECTED DIFLECTION SLOPE FOREST 1200-1400 >60% BROWN CLAY High
1 Fissile hornblende biotite gneiss HIGHLY DISSECTED DIFLECTION SLOPE FOREST 1200-1400 30-60% BROWN CLAY Moderate
Tagged:
0
Answers
-
Hi,
I'm not sure I understood you correctly, but does the example process below help you? It uses the Multiply operator to create multiple instances of your data and then at the end join the clustered result to your original data.
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<process version="6.1.001-SNAPSHOT">
<context>
<input/>
<output/>
<macros/>
</context>
<operator activated="true" class="process" compatibility="6.1.001-SNAPSHOT" expanded="true" name="Process">
<process expanded="true">
<operator activated="true" class="generate_nominal_data" compatibility="6.1.001-SNAPSHOT" expanded="true" height="60" name="Generate Nominal Data" width="90" x="45" y="75"/>
<operator activated="true" class="generate_id" compatibility="6.1.001-SNAPSHOT" expanded="true" height="76" name="Generate ID" width="90" x="179" y="75"/>
<operator activated="true" class="multiply" compatibility="6.1.001-SNAPSHOT" expanded="true" height="94" name="Multiply" width="90" x="313" y="75"/>
<operator activated="true" class="nominal_to_numerical" compatibility="6.1.001-SNAPSHOT" expanded="true" height="94" name="Nominal to Numerical" width="90" x="514" y="30">
<list key="comparison_groups"/>
</operator>
<operator activated="true" class="k_means" compatibility="6.1.001-SNAPSHOT" expanded="true" height="76" name="Clustering" width="90" x="647" y="30"/>
<operator activated="true" class="select_attributes" compatibility="6.1.001-SNAPSHOT" expanded="true" height="76" name="Select Attributes" width="90" x="781" y="30">
<parameter key="attribute_filter_type" value="subset"/>
<parameter key="attributes" value="cluster|id|label"/>
</operator>
<operator activated="true" class="join" compatibility="6.1.001-SNAPSHOT" expanded="true" height="76" name="Join" width="90" x="916" y="75">
<list key="key_attributes"/>
</operator>
<connect from_op="Generate Nominal Data" from_port="output" to_op="Generate ID" to_port="example set input"/>
<connect from_op="Generate ID" from_port="example set output" to_op="Multiply" to_port="input"/>
<connect from_op="Multiply" from_port="output 1" to_op="Nominal to Numerical" to_port="example set input"/>
<connect from_op="Multiply" from_port="output 2" to_op="Join" to_port="right"/>
<connect from_op="Nominal to Numerical" from_port="example set output" to_op="Clustering" to_port="example set"/>
<connect from_op="Clustering" from_port="clustered set" to_op="Select Attributes" to_port="example set input"/>
<connect from_op="Select Attributes" from_port="example set output" to_op="Join" to_port="left"/>
<connect from_op="Join" from_port="join" to_port="result 1"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="sink_result 1" spacing="0"/>
<portSpacing port="sink_result 2" spacing="0"/>
</process>
</operator>
</process>
Regards,
Marco0 -
Yes thank you very much ,I got the clustered output with original names however as there are 1 extra attributes cluster and id is at the end(like a float number) can you please tell me how I can use Naive Bayes on it ?
PS :I am actually trying to integrate Bayes classification with k-means clustering0 -
I am interested in that topic, too!
did you solve how to apply NB with clustering? can you show me how?
0