"Set Role Operator as document Name [Solved]"
Hi to all,
I used the rapidMiner for clustering purpose, i used K-mean.
My question is how can i know the documents which cluster inside cluster one and which cluster in cluster 2 and so on . Because i need to
calculate the quality of data clusters (using the F-meausre and Entropy).
FYI. my collection of documents around 1000 textual document . These documents named from A1 to A1000. How can i know the document which clustered in each cluster?
Is the any example of how can i used the SET ROLE OPERATOR to set the document name as column after cluster process?
my regards
Wael.
I used the rapidMiner for clustering purpose, i used K-mean.
My question is how can i know the documents which cluster inside cluster one and which cluster in cluster 2 and so on . Because i need to
calculate the quality of data clusters (using the F-meausre and Entropy).
FYI. my collection of documents around 1000 textual document . These documents named from A1 to A1000. How can i know the document which clustered in each cluster?
Is the any example of how can i used the SET ROLE OPERATOR to set the document name as column after cluster process?
my regards
Wael.
Find more posts tagged with
Sort by:
1 - 11 of
111
Many thanks for you Marius :
My steps as following
1) Process documents from Files and inside it Tokenize
2) Set Role Operator
3) Clustering operator (Kmean).
FYI: the set Role operator before the clustering and after the Process Document from files
Wael
My steps as following
1) Process documents from Files and inside it Tokenize
2) Set Role Operator
3) Clustering operator (Kmean).
FYI: the set Role operator before the clustering and after the Process Document from files
<?xml version="1.0" encoding="UTF-8" standalone="no"?>my regards
<process version="5.2.008">
<context>
<input/>
<output/>
<macros/>
</context>
<operator activated="true" class="process" compatibility="5.2.008" expanded="true" name="Process">
<process expanded="true" height="296" width="503">
<operator activated="true" class="text:process_document_from_file" compatibility="5.3.000" expanded="true" height="76" name="Process Documents from Files" width="90" x="45" y="210">
<list key="text_directories">
<parameter key="mydata" value="D:\Classic_subset\ready"/>
</list>
<process expanded="true" height="586" width="559">
<operator activated="true" class="text:tokenize" compatibility="5.3.000" expanded="true" height="60" name="Tokenize" width="90" x="234" y="30"/>
<connect from_port="document" to_op="Tokenize" to_port="document"/>
<connect from_op="Tokenize" from_port="document" to_port="document 1"/>
<portSpacing port="source_document" spacing="0"/>
<portSpacing port="sink_document 1" spacing="0"/>
<portSpacing port="sink_document 2" spacing="0"/>
</process>
</operator>
<operator activated="true" class="set_role" compatibility="5.2.008" expanded="true" height="76" name="Set Role" width="90" x="112" y="75">
<list key="set_additional_roles">
<parameter key="metadata_file" value="id"/>
</list>
</operator>
<operator activated="true" class="k_means" compatibility="5.2.008" expanded="true" height="76" name="Clustering" width="90" x="246" y="30"/>
<connect from_op="Process Documents from Files" from_port="example set" to_op="Set Role" to_port="example set input"/>
<connect from_op="Process Documents from Files" from_port="word list" to_port="result 1"/>
<connect from_op="Set Role" from_port="example set output" to_op="Clustering" to_port="example set"/>
<connect from_op="Clustering" from_port="cluster model" to_port="result 2"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="sink_result 1" spacing="0"/>
<portSpacing port="sink_result 2" spacing="0"/>
<portSpacing port="sink_result 3" spacing="0"/>
</process>
</operator>
</process>
Wael
Many thanks for yoy Marius,
Also same problem the out put without the file name
Also same problem the out put without the file name
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<process version="5.2.008">
<context>
<input/>
<output/>
<macros/>
</context>
<operator activated="true" class="process" compatibility="5.2.008" expanded="true" name="Process">
<process expanded="true" height="386" width="681">
<operator activated="true" class="text:process_document_from_file" compatibility="5.3.001" expanded="true" height="76" name="Process Documents from Files" width="90" x="45" y="255">
<list key="text_directories">
<parameter key="ssssss" value="F:\dataset"/>
</list>
<process expanded="true" height="519" width="806">
<operator activated="true" class="text:tokenize" compatibility="5.3.001" expanded="true" height="60" name="Tokenize" width="90" x="358" y="30"/>
<connect from_port="document" to_op="Tokenize" to_port="document"/>
<connect from_op="Tokenize" from_port="document" to_port="document 1"/>
<portSpacing port="source_document" spacing="0"/>
<portSpacing port="sink_document 1" spacing="0"/>
<portSpacing port="sink_document 2" spacing="0"/>
</process>
</operator>
<operator activated="true" class="k_means" compatibility="5.2.008" expanded="true" height="76" name="Clustering" width="90" x="112" y="75"/>
<operator activated="true" class="set_role" compatibility="5.2.008" expanded="true" height="76" name="Set Role" width="90" x="246" y="30">
<parameter key="name" value="metadata_file"/>
<parameter key="target_role" value="id"/>
<list key="set_additional_roles">
<parameter key="metadata_file" value="id"/>
</list>
</operator>
<connect from_op="Process Documents from Files" from_port="example set" to_op="Clustering" to_port="example set"/>
<connect from_op="Process Documents from Files" from_port="word list" to_port="result 1"/>
<connect from_op="Clustering" from_port="clustered set" to_op="Set Role" to_port="example set input"/>
<connect from_op="Set Role" from_port="example set output" to_port="result 2"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="sink_result 1" spacing="0"/>
<portSpacing port="sink_result 2" spacing="0"/>
<portSpacing port="sink_result 3" spacing="90"/>
</process>
</operator>
</process>
I do like the follwing:
Many thanks for you Marius :
My steps as following
1) Process documents from Files and inside it Tokenize
2) Set Role Operator
3) Clustering operator (Kmean).
Or
Many thanks for you Marius :
My steps as following
1) Process documents from Files and inside it Tokenize
2) Clustering operator (Kmean).
3) Set Role Operator .
Please have a look to this :
Many thanks for you Marius :
My steps as following
1) Process documents from Files and inside it Tokenize
2) Set Role Operator
3) Clustering operator (Kmean).
Or
Many thanks for you Marius :
My steps as following
1) Process documents from Files and inside it Tokenize
2) Clustering operator (Kmean).
3) Set Role Operator .
Please have a look to this :
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<process version="5.2.008">
<context>
<input/>
<output/>
<macros/>
</context>
<operator activated="true" class="process" compatibility="5.2.008" expanded="true" name="Process">
<process expanded="true" height="386" width="681">
<operator activated="true" class="text:process_document_from_file" compatibility="5.3.001" expanded="true" height="76" name="Process Documents from Files" width="90" x="45" y="210">
<list key="text_directories">
<parameter key="ssssss" value="F:\dataset"/>
</list>
<process expanded="true" height="519" width="806">
<operator activated="true" class="text:tokenize" compatibility="5.3.001" expanded="true" height="60" name="Tokenize" width="90" x="358" y="30"/>
<connect from_port="document" to_op="Tokenize" to_port="document"/>
<connect from_op="Tokenize" from_port="document" to_port="document 1"/>
<portSpacing port="source_document" spacing="0"/>
<portSpacing port="sink_document 1" spacing="0"/>
<portSpacing port="sink_document 2" spacing="0"/>
</process>
</operator>
<operator activated="true" class="set_role" compatibility="5.2.008" expanded="true" height="76" name="Set Role" width="90" x="179" y="75">
<parameter key="name" value="metadata_file"/>
<parameter key="target_role" value="id"/>
<list key="set_additional_roles">
<parameter key="metadata_file" value="id"/>
</list>
</operator>
<operator activated="true" class="k_means" compatibility="5.2.008" expanded="true" height="76" name="Clustering" width="90" x="313" y="75"/>
<connect from_op="Process Documents from Files" from_port="example set" to_op="Set Role" to_port="example set input"/>
<connect from_op="Process Documents from Files" from_port="word list" to_port="result 1"/>
<connect from_op="Set Role" from_port="example set output" to_op="Clustering" to_port="example set"/>
<connect from_op="Clustering" from_port="cluster model" to_port="result 2"/>
<connect from_op="Clustering" from_port="clustered set" to_port="result 3"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="sink_result 1" spacing="0"/>
<portSpacing port="sink_result 2" spacing="0"/>
<portSpacing port="sink_result 3" spacing="0"/>
<portSpacing port="sink_result 4" spacing="0"/>
</process>
</operator>
</process>
Hi marius,
Yes, i already upgrade my version now.
I want to show the document name in the cluster .
for example using the K-mean, i produce 5 clusters based , i want to know which document inside which cluster.
for example :
I have document a , b, c , d and so on , if the document b and c in one cluster just show me the name of documents as attribute.
if you ask me Why i need that ?
i need to calculate the f-measure based on my data set.
my regards
Wael.
Yes, i already upgrade my version now.
I want to show the document name in the cluster .
for example using the K-mean, i produce 5 clusters based , i want to know which document inside which cluster.
for example :
I have document a , b, c , d and so on , if the document b and c in one cluster just show me the name of documents as attribute.
if you ask me Why i need that ?
i need to calculate the f-measure based on my data set.
my regards
Wael.
Many thanks for you Mr.Marius for you help .... Yes already done . Thanks alot ..
have alook to this
have alook to this
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<process version="5.3.013">
<context>
<input/>
<output/>
<macros/>
</context>
<operator activated="true" class="process" compatibility="5.3.013" expanded="true" name="Process">
<process expanded="true">
<operator activated="true" class="text:process_document_from_file" compatibility="5.3.000" expanded="true" height="76" name="Process Documents from Files" width="90" x="45" y="210">
<list key="text_directories">
<parameter key="oo" value="D:\Classic_subset\test"/>
</list>
<process expanded="true">
<operator activated="true" class="text:tokenize" compatibility="5.3.000" expanded="true" height="60" name="Tokenize" width="90" x="234" y="30"/>
<connect from_port="document" to_op="Tokenize" to_port="document"/>
<connect from_op="Tokenize" from_port="document" to_port="document 1"/>
<portSpacing port="source_document" spacing="0"/>
<portSpacing port="sink_document 1" spacing="0"/>
<portSpacing port="sink_document 2" spacing="0"/>
</process>
</operator>
<operator activated="true" class="set_role" compatibility="5.3.013" expanded="true" height="76" name="Set Role" width="90" x="112" y="75">
<parameter key="attribute_name" value="metadata_file"/>
<parameter key="target_role" value="id"/>
<list key="set_additional_roles"/>
</operator>
<operator activated="true" class="k_means" compatibility="5.3.013" expanded="true" height="76" name="Clustering" width="90" x="246" y="30">
<parameter key="max_runs" value="1"/>
<parameter key="max_optimization_steps" value="1"/>
</operator>
<connect from_op="Process Documents from Files" from_port="example set" to_op="Set Role" to_port="example set input"/>
<connect from_op="Process Documents from Files" from_port="word list" to_port="result 2"/>
<connect from_op="Set Role" from_port="example set output" to_op="Clustering" to_port="example set"/>
<connect from_op="Clustering" from_port="cluster model" to_port="result 1"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="sink_result 1" spacing="0"/>
<portSpacing port="sink_result 2" spacing="0"/>
<portSpacing port="sink_result 3" spacing="0"/>
</process>
</operator>
</process>
use the set Role operator before the clustering operator to define the column that contains the document names as id.
Best regards,
Marius