Set roles for unsupervised learning - clustering

Hi there,

I have a dataset example as follow : id_productColorreferencequantity_orderedprice_paidweight_in_grams1RedA110001002RedA225002003BlueA220002004RedB20200008005BlueB820006506RedB60050000120007BlueB545005008BlueC880008009BlueC21000150010RedD965006500011BlueE12120001405012RedE45450003350

I want to perform a clustering operation to detect anomalies, but i am not sure about the kind of role i must give to my attributes.

I was thinking about :

id_product : id
Color : cluster
reference : label
quantity_ordered : weight
price_paid : regular
weight_in_grams : regular

Am i wrong ? right ?

Thanks for help.

Find more posts tagged with

AI Studio

Accepted answers

All comments

lovefinearts198

Can anyone can give me a lead or a way to understand the roles in rapidminer ?

lovefinearts198

Hello again,

is my question too easy or too complexe ??

Here's rapidminer help extract :

Description
This operator can be used to change the role of an attribute of the input ExampleSet. If you want to change the attribute name you should use the
Rename operator. The target role indicates if the attribute is a regular attribute (used by learning operators) or a special attribute (e.g. a label or id attribute).

The following target attribute types are possible:
regular: only regular attributes are used as input variables for learning tasks
id: the id attribute for the example set
label: target attribute for learning
prediction: predicted attribute, i.e. the predictions of a learning scheme
cluster: indicates the membership to a cluster
weight: indicates the weight of the example
batch: indicates the membership to an example batch
Users can also define own attribute types by simply using the desired name.

Please be aware that roles have to be unique! Assigning a non regular role the second time will cause the first attribute to be dropped from the example set. If you want to keep this attribute, you have to change it's role first.

So perhaps this is better ?

id_product : id
Color : regular
reference : label
quantity_ordered : regular
price_paid : regular
weight_in_grams : regular

Andrew2

Hello

Set the attributes you want to use to drive cluster membership to be "regular"

All other types will by ignored by the clustering.

I don't know your data but if the attribute called "reference" is some sort of pre-existing classification and you want to compare with the final clustering then it makes sense to set the role of this to be label as you have done. There is an operator called "map clustering on labels" that can be used to determine which cluster is closest to the labels. the resultant example set contains a prediction that can be used to determine a performance measure using the "performance" operator.

regards

Andrew