The problem is the result of clustering

jabra New Altair Community Member
edited November 2024 in Community Q&A

Dear engineers
I want to cluster
I have five columns
I want to cluster in the third column, which is the text
With the select attribute operator I chose the third column for clustering.
I want to put the clustering result at the end of the clustering, in the output of all the columns, plus the column.
what should I do???
Thank you so much if you help me



  • lionelderkrikor
    lionelderkrikor New Altair Community Member

    Hi @jabra,


    Can you share your dataset and your process, please ?

    Otherwise, can you give an example of what you want to obtain : I have difficulties to understand what you want to do.






  • jabra
    jabra New Altair Community Member

    thanks for your response
    I have no access to the data and my rapidminer file. Which I send.
    I have five columns with the names: idiot. name . lable. Address. Description . I have
    I want to cluster the description based on the column name.
    At the end of the clustering on the output. I have all the columns with the cluster output column. that's mean
      Idiot name . lable. Address. Description and cluster
    In the output, I can tell which sentence in the cluster has the x lable.
    Thank you very much if you help me

  • lionelderkrikor
    lionelderkrikor New Altair Community Member

    Hi @jabra


    I propose you this process (to adapt and complete with your own data) : 

    <?xml version="1.0" encoding="UTF-8"?><process version="8.2.000">
    <operator activated="true" class="select_attributes" compatibility="8.2.000" expanded="true" height="82" name="Select Attributes" width="90" x="313" y="34">
    <parameter key="attribute_filter_type" value="single"/>
    <parameter key="attribute" value="label"/>
    <parameter key="attributes" value=""/>
    <parameter key="use_except_expression" value="false"/>
    <parameter key="value_type" value="attribute_value"/>
    <parameter key="use_value_type_exception" value="false"/>
    <parameter key="except_value_type" value="time"/>
    <parameter key="block_type" value="attribute_block"/>
    <parameter key="use_block_type_exception" value="false"/>
    <parameter key="except_block_type" value="value_matrix_row_start"/>
    <parameter key="invert_selection" value="true"/>
    <parameter key="include_special_attributes" value="true"/>

    Does this process answer to your need ?






  • lionelderkrikor
    lionelderkrikor New Altair Community Member

    Hi again @jabra,


    Here you can find a new version of the previous process (maybe more adapted to your need) : 

    <?xml version="1.0" encoding="UTF-8"?><process version="8.2.000">
    <operator activated="true" class="process" compatibility="8.2.000" expanded="true" name="Process">
    <process expanded="true">
    <operator activated="true" class="retrieve" compatibility="8.2.000" expanded="true" height="68" name="Retrieve Iris" width="90" x="179" y="34">
    <parameter key="repository_entry" value="//Samples/data/Iris"/>
    <operator activated="true" class="select_attributes" compatibility="8.2.000" expanded="true" height="82" name="Select Attributes" width="90" x="313" y="34">
    <parameter key="attribute_filter_type" value="single"/>
    <parameter key="attribute" value="label"/>
    <parameter key="invert_selection" value="true"/>
    <parameter key="include_special_attributes" value="true"/>
    <operator activated="true" class="generate_id" compatibility="8.2.000" expanded="true" height="82" name="Generate ID" width="90" x="514" y="238"/>
    <operator activated="true" class="concurrency:k_means" compatibility="8.2.000" expanded="true" height="82" name="Clustering" width="90" x="447" y="34">
    <parameter key="k" value="3"/>
    <operator activated="true" class="select_attributes" compatibility="8.2.000" expanded="true" height="82" name="Select Attributes (2)" width="90" x="581" y="85">
    <parameter key="attribute_filter_type" value="single"/>
    <parameter key="attribute" value="cluster"/>
    <parameter key="include_special_attributes" value="true"/>
    <operator activated="true" class="generate_id" compatibility="8.2.000" expanded="true" height="82" name="Generate ID (2)" width="90" x="715" y="85"/>
    <operator activated="true" class="concurrency:join" compatibility="8.2.000" expanded="true" height="82" name="Join" width="90" x="849" y="85">
    <list key="key_attributes"/>
    <operator activated="true" class="generate_attributes" compatibility="8.2.000" expanded="true" height="82" name="Generate Attributes" width="90" x="983" y="85">
    <list key="function_descriptions">
    <parameter key="a1" value="concat(str([a1]),&quot;_&quot;,[cluster])"/>
    <parameter key="a2" value="concat(str([a2]),&quot;_&quot;,[cluster])"/>
    <parameter key="a3" value="concat(str([a3]),&quot;_&quot;,[cluster])"/>
    <parameter key="a4" value="concat(str([a4]),&quot;_&quot;,[cluster])"/>
    <connect from_op="Retrieve Iris" from_port="output" to_op="Select Attributes" to_port="example set input"/>
    <connect from_op="Select Attributes" from_port="example set output" to_op="Clustering" to_port="example set"/>
    <connect from_op="Select Attributes" from_port="original" to_op="Generate ID" to_port="example set input"/>
    <connect from_op="Generate ID" from_port="example set output" to_op="Join" to_port="left"/>
    <connect from_op="Clustering" from_port="cluster model" to_port="result 1"/>
    <connect from_op="Clustering" from_port="clustered set" to_op="Select Attributes (2)" to_port="example set input"/>
    <connect from_op="Select Attributes (2)" from_port="example set output" to_op="Generate ID (2)" to_port="example set input"/>
    <connect from_op="Generate ID (2)" from_port="example set output" to_op="Join" to_port="right"/>
    <connect from_op="Join" from_port="join" to_op="Generate Attributes" to_port="example set input"/>
    <connect from_op="Generate Attributes" from_port="example set output" to_port="result 2"/>
    <portSpacing port="source_input 1" spacing="0"/>
    <portSpacing port="sink_result 1" spacing="0"/>
    <portSpacing port="sink_result 2" spacing="0"/>
    <portSpacing port="sink_result 3" spacing="0"/>




  • marcin_blachnik
    marcin_blachnik New Altair Community Member

    The easiest and the fastest way is to define a special roles for other columns except the one you want to cluster. In this case you would not need any select attributes, joins etc. You can do it because RapidMiner uses for any analysis (including clustering, classification regression etc) only the regular attributes. 


    Just put Set Role operator and type in "target role" (define your own role)

  • elena2020chao
    elena2020chao New Altair Community Member

    Dear Friends
    I use the process document from data operator. I want to have columns in the tokenize of words in addition to the main columns and labels and clustering.
    How to change
    Thank you for helping me too

  • jabra
    jabra New Altair Community Member

    Very much of the process you sent. Thank you
    Just dear dear engineer
    What if I want to see the results of tokenize in the output? As our friend's question is (@ elena2020chao)

    And how to evaluate the outcome?
    See error

    Thanks again if you send the process

  • jabra
    jabra New Altair Community Member

    Has anyone ever done this? Who can help me? I need very much ...
    Thank you so much if you help me

  • Thomas_Ott
    Thomas_Ott New Altair Community Member

    @jabra You have nominal values in data set that the performance operator can't use. 


    You need to convert everything to a numerical value. 

  • jabra
    jabra New Altair Community Member

    Thank you
    I am clustering on the text field
    What should I do?