🎉Community Raffle - Win $25

An exclusive raffle opportunity for active members like you! Complete your profile, answer questions and get your first accepted badge to enter the raffle.
Join and Win

RapidMiner 5 documentation?

User: "im"
New Altair Community Member
Updated by Jocelyn
Hi all.
My first message in this forum.
I am a beginner in Data mining too. I have been studing the teorical part and now I would like to try my first practical sample…

To the best of my knowledge, there is no rapidminer manual for version 5, right?
Please, do you know any places with tutorials/documentation for version 5?


I have a data set with animal attributes where Animal_Type (e.g mammal, bird, fish, etc) is the attribute class to be used for prediction.

I would like to use clustering to see the rapidminer tool grouping animals and create a model for prediction.

Note: I know that decision tree are common used but for now I would like to use clustering.

I already have a rapid miner object that opens the csv data set (ReadCSV) and a ChangeAttributeRole to set the Animal_type as the “label attribute” and I can see the output when I run the process.

Then I added the k-means clustering algorithm with k=2 (after adding nominal to numerical object) and I see the final two clusters with animals. But as initial dataset has four different classes this is not correct. So I configured the clustering algorithm to have k = 4 and I run it again. The animals are now grouped in four clusters but the clusters do not make sense. Why does clustering algorithm does not chooses the aminal_type as the field to split them into different clusters (I now that there is a centroid). So I am thining that k-means is not the right algorithm. Please, could you highlight my way on this?


Next, I would like to make the same but using decision trees. (It seems that it is easy to use decision trees on rapidminer) and then find out which method has more accuracy (clustering or decision trees). Which is the right rapidminer object to compute models accuracy (clustering vs decision tree) for this data set? A simple project would be very appreciated.


Thank you.

I.M.

Find more posts tagged with