RapidMiner 5 documentation?

im
im New Altair Community Member
edited November 5 in Community Q&A
Hi all.
My first message in this forum.
I am a beginner in Data mining too. I have been studing the teorical part and now I would like to try my first practical sample…

To the best of my knowledge, there is no rapidminer manual for version 5, right?
Please, do you know any places with tutorials/documentation for version 5?


I have a data set with animal attributes where Animal_Type (e.g mammal, bird, fish, etc) is the attribute class to be used for prediction.

I would like to use clustering to see the rapidminer tool grouping animals and create a model for prediction.

Note: I know that decision tree are common used but for now I would like to use clustering.

I already have a rapid miner object that opens the csv data set (ReadCSV) and a ChangeAttributeRole to set the Animal_type as the “label attribute” and I can see the output when I run the process.

Then I added the k-means clustering algorithm with k=2 (after adding nominal to numerical object) and I see the final two clusters with animals. But as initial dataset has four different classes this is not correct. So I configured the clustering algorithm to have k = 4 and I run it again. The animals are now grouped in four clusters but the clusters do not make sense. Why does clustering algorithm does not chooses the aminal_type as the field to split them into different clusters (I now that there is a centroid). So I am thining that k-means is not the right algorithm. Please, could you highlight my way on this?


Next, I would like to make the same but using decision trees. (It seems that it is easy to use decision trees on rapidminer) and then find out which method has more accuracy (clustering or decision trees). Which is the right rapidminer object to compute models accuracy (clustering vs decision tree) for this data set? A simple project would be very appreciated.


Thank you.

I.M.

Answers

  • fischer
    fischer New Altair Community Member
    Hi,

    Regarding the documentation: We will have a brand new documentation soon, but it will be in German. The English version will follow.

    1st question (Why does kMeans not use animal_type); You marked animal_type as the label. The clustering algorithm does not see the label attribute at all, it only uses the regular attributes. After all, telling the clustering algorithm what the clustering actually is beforehand would not make much sense, would it?

    2nd question (How to compare clustering and prediction): If you do things like this, always think whether it makes sense. You can use the "Map Clutering to Labels" operator to turn the cluster attribute into a "best fitting" prediction attribute. After that, you can use the regular performance operators to compute whatever performance measure you are interested in.

    Cheers,
    Simon
  • Robertk
    Robertk New Altair Community Member
    Hi Simon,

    sorry for bringing up old threads:

    any news on the new (german) documentation?
  • land
    land New Altair Community Member
    Hi,
    yes, it's currently being layouted for the final publication and translation. I will keep you informed as soon as we have published it.

    Greetings,
      Sebastian
  • Ghostrider
    Ghostrider New Altair Community Member
    Any update on the English RM 5 manual?
  • IngoRM
    IngoRM New Altair Community Member
    Hi,

    we expect the translation to be finished (beginning of) July.

    Cheers,
    Ingo