Cluster algortyms

Selim
New Altair Community Member
what is the x-means and k- medoid ? And also what is the difference between k-means of this two algortym ?
Tagged:
0
Best Answer
-
Hello,
Let's start with understanding k-Means. You set a number of clusters (k) and the algorithm determines what examples belong to that cluster by determining how far are they from that specific cluster. Then the centroids of each cluster are calculated by averaging the distances of all the examples that belong to that cluster to that cluster.
The k-Medoids algorithm is almost the same as the k-Means algorithm with one difference: the center of a cluster is moved to an example, rather than an imaginary number taken from the calculation specified above.
The x-Means algorithm is an improvement. You don't have to determine the number of clusters. Instead, someone said that there is a possibility of determining the correct number of clusters by running a quick heuristic (e.g. an algorithm that belongs to IA but not to Machine Learning). That heuristic determines how many K's are required for that specific example set, and then the algorithm is more or less the same as a k-Means.
There is a lot of n-dimensional geometry in explaining these algorithms. That is why you need to use these with numbers only.
Hope this helps.
All the best,
Rodrigo.
7
Answers
-
Hello,
Let's start with understanding k-Means. You set a number of clusters (k) and the algorithm determines what examples belong to that cluster by determining how far are they from that specific cluster. Then the centroids of each cluster are calculated by averaging the distances of all the examples that belong to that cluster to that cluster.
The k-Medoids algorithm is almost the same as the k-Means algorithm with one difference: the center of a cluster is moved to an example, rather than an imaginary number taken from the calculation specified above.
The x-Means algorithm is an improvement. You don't have to determine the number of clusters. Instead, someone said that there is a possibility of determining the correct number of clusters by running a quick heuristic (e.g. an algorithm that belongs to IA but not to Machine Learning). That heuristic determines how many K's are required for that specific example set, and then the algorithm is more or less the same as a k-Means.
There is a lot of n-dimensional geometry in explaining these algorithms. That is why you need to use these with numbers only.
Hope this helps.
All the best,
Rodrigo.
7 -
Thanks @rfuentealba . I got a question one more .fuzzy c mean and x-means are same things ?0
-
Hello @Selim
No, fuzzy clustering algorithms use a different type of function, called the "fuzzer" or "fuzzifier", to see if an algorithm belongs to certain cluster or not. While the idea of clustering remains the same, fuzzy clustering uses similarity, intensity and distance as the three stooges main points of analysis, and one example can potentially (though not commonly) belong to more than one cluster. That isn't possible with k-Means, k-Medoids and X-Means, because these are "hard labeled".
Fuzzy C means is available in the "Information Selection" plugin for RapidMiner. It's not part of the standard RapidMiner, BTW.
All the best,
Rodrigo.2 -
@rfuentealba firstly thank for ur answers. ı got one question more . now ı have been working on a k-means clustering algorthm for a zoning warehouse.and it is working with execute python operator and it is dividing to clusters same size.and ı got attribute which is "volume".as you know volume is very important for warehouse so ı want to that sum of all clusters volume gonna be equal each other .so how can ı do that ? ı have shared my xml below.ı am waiting 4 ur answer
kind regards
---------------------<?xml version="1.0" encoding="UTF-8"?><process version="9.2.001"><context><input/><output/><macros/></context><operator activated="true" class="process" compatibility="9.2.001" expanded="true" name="Process"><parameter key="logverbosity" value="init"/><parameter key="random_seed" value="2001"/><parameter key="send_mail" value="never"/><parameter key="notification_email" value=""/><parameter key="process_duration_for_mail" value="30"/><parameter key="encoding" value="SYSTEM"/><process expanded="true"><operator activated="true" class="read_excel" compatibility="9.2.001" expanded="true" height="68" name="Read Excel" width="90" x="45" y="85"><parameter key="excel_file" value="C:\Users\selimcelebi\Desktop\Yeni Microsoft Excel Çalışma Sayfası.xlsx"/><parameter key="sheet_selection" value="sheet number"/><parameter key="sheet_number" value="1"/><parameter key="imported_cell_range" value="A1"/><parameter key="encoding" value="SYSTEM"/><parameter key="first_row_as_names" value="true"/><list key="annotations"/><parameter key="date_format" value=""/><parameter key="time_zone" value="SYSTEM"/><parameter key="locale" value="English (United States)"/><parameter key="read_all_values_as_polynominal" value="false"/><list key="data_set_meta_data_information"><parameter key="0" value="StockCode.true.integer.attribute"/><parameter key="1" value="Description.true.polynominal.attribute"/><parameter key="2" value="weight(gram).true.integer.attribute"/><parameter key="3" value="volume(cm3).true.integer.attribute"/><parameter key="4" value="quantity.true.integer.attribute"/><parameter key="5" value="UnitPrice.true.real.attribute"/><parameter key="6" value="fragility.true.integer.attribute"/></list><parameter key="read_not_matching_values_as_missings" value="false"/><parameter key="datamanagement" value="double_array"/><parameter key="data_management" value="auto"/></operator><operator activated="true" class="normalize" compatibility="9.2.001" expanded="true" height="103" name="Normalize" width="90" x="246" y="85"><parameter key="return_preprocessing_model" value="false"/><parameter key="create_view" value="false"/><parameter key="attribute_filter_type" value="all"/><parameter key="attribute" value=""/><parameter key="attributes" value="|fragility|StockCode|volume(cm3)|weight(gram)|UnitPrice|quantity"/><parameter key="use_except_expression" value="false"/><parameter key="value_type" value="numeric"/><parameter key="use_value_type_exception" value="false"/><parameter key="except_value_type" value="real"/><parameter key="block_type" value="value_series"/><parameter key="use_block_type_exception" value="false"/><parameter key="except_block_type" value="value_series_end"/><parameter key="invert_selection" value="false"/><parameter key="include_special_attributes" value="false"/><parameter key="method" value="Z-transformation"/><parameter key="min" value="0.0"/><parameter key="max" value="1.0"/><parameter key="allow_negative_values" value="false"/></operator><operator activated="true" class="generate_attributes" compatibility="9.2.001" expanded="true" height="82" name="Generate Attributes" width="90" x="447" y="85"><list key="function_descriptions"/><parameter key="keep_all" value="true"/></operator><operator activated="true" class="select_attributes" compatibility="9.2.001" expanded="true" height="82" name="Select Attributes" width="90" x="581" y="85"><parameter key="attribute_filter_type" value="subset"/><parameter key="attribute" value="F"/><parameter key="attributes" value="|Description|StockCode"/><parameter key="use_except_expression" value="false"/><parameter key="value_type" value="attribute_value"/><parameter key="use_value_type_exception" value="false"/><parameter key="except_value_type" value="time"/><parameter key="block_type" value="attribute_block"/><parameter key="use_block_type_exception" value="false"/><parameter key="except_block_type" value="value_matrix_row_start"/><parameter key="invert_selection" value="true"/><parameter key="include_special_attributes" value="false"/></operator><operator activated="true" class="select_attributes" compatibility="9.2.001" expanded="true" height="82" name="Select Attributes (2)" width="90" x="715" y="238"><parameter key="attribute_filter_type" value="subset"/><parameter key="attribute" value="F"/><parameter key="attributes" value="|Description"/><parameter key="use_except_expression" value="false"/><parameter key="value_type" value="attribute_value"/><parameter key="use_value_type_exception" value="false"/><parameter key="except_value_type" value="time"/><parameter key="block_type" value="attribute_block"/><parameter key="use_block_type_exception" value="false"/><parameter key="except_block_type" value="value_matrix_row_start"/><parameter key="invert_selection" value="false"/><parameter key="include_special_attributes" value="false"/></operator><operator activated="true" class="generate_id" compatibility="9.2.001" expanded="true" height="82" name="Generate ID (2)" width="90" x="983" y="187"><parameter key="create_nominal_ids" value="false"/><parameter key="offset" value="0"/></operator><operator activated="true" class="set_macros" compatibility="9.2.001" expanded="true" height="82" name="Set Macros" width="90" x="715" y="85"><list key="macros"><parameter key="cluster_number" value="10"/></list></operator><operator activated="true" class="python_scripting:execute_python" compatibility="9.2.000" expanded="true" height="103" name="Execute Python" width="90" x="849" y="85"><parameter key="script" value="import pandas as pd from operator import itemgetter import numpy as np import random import sys from scipy.spatial import distance from sklearn.cluster import KMeans C = %{cluster_number} def k_means(X) : kmeans = KMeans(n_clusters=C, random_state=0).fit(X) return kmeans.cluster_centers_ def samesizecluster( D ): """ in: point-to-cluster-centre distances D, Npt x C out: xtoc, X -> C, equal-size clusters """ Npt, C = D.shape clustersize = (Npt + C - 1) // C xcd = list( np.ndenumerate(D) ) # ((0,0), d00), ((0,1), d01) ... xcd.sort( key=itemgetter(1) ) xtoc = np.ones( Npt, int ) * -1 nincluster = np.zeros( C, int ) nall = 0 for (x,c), d in xcd: if xtoc[x] < 0 and nincluster[c] < clustersize: xtoc[x] = c nincluster[c] += 1 nall += 1 if nall >= Npt: break return xtoc def rm_main(data): data_2 = data.values centres = k_means(data_2) D = distance.cdist( data_2, centres ) xtoc = samesizecluster( D ) data['cluster'] = xtoc return data"/><parameter key="use_default_python" value="true"/><parameter key="package_manager" value="conda (anaconda)"/></operator><operator activated="true" class="set_role" compatibility="9.2.001" expanded="true" height="82" name="Set Role (2)" width="90" x="983" y="85"><parameter key="attribute_name" value="cluster"/><parameter key="target_role" value="cluster"/><list key="set_additional_roles"/></operator><operator activated="true" class="generate_id" compatibility="9.2.001" expanded="true" height="82" name="Generate ID" width="90" x="1117" y="85"><parameter key="create_nominal_ids" value="false"/><parameter key="offset" value="0"/></operator><operator activated="true" class="concurrency:join" compatibility="9.2.001" expanded="true" height="82" name="Join" width="90" x="1251" y="187"><parameter key="remove_double_attributes" value="true"/><parameter key="join_type" value="inner"/><parameter key="use_id_attribute_as_key" value="true"/><list key="key_attributes"/><parameter key="keep_both_join_attributes" value="false"/></operator><connect from_port="input 1" to_op="Read Excel" to_port="file"/><connect from_op="Read Excel" from_port="output" to_op="Normalize" to_port="example set input"/><connect from_op="Normalize" from_port="example set output" to_op="Generate Attributes" to_port="example set input"/><connect from_op="Generate Attributes" from_port="example set output" to_op="Select Attributes" to_port="example set input"/><connect from_op="Select Attributes" from_port="example set output" to_op="Set Macros" to_port="through 1"/><connect from_op="Select Attributes" from_port="original" to_op="Select Attributes (2)" to_port="example set input"/><connect from_op="Select Attributes (2)" from_port="example set output" to_op="Generate ID (2)" to_port="example set input"/><connect from_op="Generate ID (2)" from_port="example set output" to_op="Join" to_port="left"/><connect from_op="Set Macros" from_port="through 1" to_op="Execute Python" to_port="input 1"/><connect from_op="Execute Python" from_port="output 1" to_op="Set Role (2)" to_port="example set input"/><connect from_op="Set Role (2)" from_port="example set output" to_op="Generate ID" to_port="example set input"/><connect from_op="Generate ID" from_port="example set output" to_op="Join" to_port="right"/><connect from_op="Join" from_port="join" to_port="result 1"/><portSpacing port="source_input 1" spacing="0"/><portSpacing port="source_input 2" spacing="0"/><portSpacing port="sink_result 1" spacing="0"/><portSpacing port="sink_result 2" spacing="0"/></process></operator></process>0 -
Hello @Selim
Are you asking about taking the sum of volume column based on the cluster number? If so, you can use the Aggregate operator and group by based on cluster column.
If this is not the answer you are looking for. please explain a bit more about your requirement.
Thanks1 -
Hello @Selim,
I have the same questions as Varun has. I will work on your problem tomorrow, I promise.BTW, friendly moderator's advice: you are making too many questions on the same thread, and that makes it difficult to find proper answers in a future. It's not like we are charging you for writing a new post each time you have questions for the community.All the best,
Rodrigo.0 -
@varunm1 @rfuentealba firstly thanks for your answer.ı will tell it again rn.
firtsly ı need say that ı am doing clustering with 4 attribute which include "volume"
and ı am doing this clustering in the warehouse(storage) so volume is very important for me so when ı cluster to items sum of each clusters has to be equal(volume) . if ı have to give an example
.in this data ı wanna that cluster 1 gonna be = 1-3-5 cluster 2 gonna be =2-4-6 because sum of volume of every cluster same that it is 60 .ı hope u got what ı mean .ıf u dont pls say it to me.ı am waiting for ur answer .- item no volume
2 15
3 20
4 25
5 30
6 200 -
@varunm1 @rfuentealba do you have any idea ? ı really need to solve to this problem.if you wanna ı can share my process via xml to understand exactly what ı am doing
Kind Regards,0