Set roles for unsupervised learning - clustering

lovefinearts198
lovefinearts198 New Altair Community Member
edited November 5 in Community Q&A
Hi there,

I have a dataset example as follow : id_productColorreferencequantity_orderedprice_paidweight_in_grams1RedA110001002RedA225002003BlueA220002004RedB20200008005BlueB820006506RedB60050000120007BlueB545005008BlueC880008009BlueC21000150010RedD965006500011BlueE12120001405012RedE45450003350
I want to perform a clustering operation to detect anomalies, but i am not sure about the kind of role i must give to my attributes.

I was thinking about :
  • id_product : id
  • Color : cluster
  • reference : label
  • quantity_ordered : weight
  • price_paid : regular
  • weight_in_grams : regular
Am i wrong ? right ?

Thanks for help.
Tagged:

Answers

  • lovefinearts198
    lovefinearts198 New Altair Community Member
    Can anyone can give me a lead or a way to understand the roles in rapidminer ?
  • lovefinearts198
    lovefinearts198 New Altair Community Member
    Hello again,

    is my question too easy or too complexe ??

    Here's rapidminer help extract :
    Description
    This operator can be used to change the role of an attribute of the input ExampleSet. If you want to change the attribute name you should use the
    Rename operator. The target role indicates if the attribute is a regular attribute (used by learning operators) or a special attribute (e.g. a label or id attribute).

    The following target attribute types are possible:
    • regular: only regular attributes are used as input variables for learning tasks
    • id: the id attribute for the example set
    • label: target attribute for learning
    • prediction: predicted attribute, i.e. the predictions of a learning scheme
    • cluster: indicates the membership to a cluster
    • weight: indicates the weight of the example
    • batch: indicates the membership to an example batch
    Users can also define own attribute types by simply using the desired name.

    Please be aware that roles have to be unique! Assigning a non regular role the second time will cause the first attribute to be dropped from the example set. If you want to keep this attribute, you have to change it's role first.
    So perhaps this is better ?
    • id_product : id
    • Color : regular
    • reference : label
    • quantity_ordered : regular
    • price_paid : regular
    • weight_in_grams : regular
  • Hello

    Set the attributes you want to use to drive cluster membership to be "regular"

    All other types will by ignored by the clustering.

    I don't know your data but if the attribute called "reference" is some sort of pre-existing classification and you want to compare with the final clustering then it makes sense to set the role of this to be label as you have done. There is an operator called "map clustering on labels" that can be used to determine which cluster is closest to the labels. the resultant example set contains a prediction that can be used to determine a performance measure using the "performance" operator.

    regards

    Andrew