Generating new data out of modeling results

Question

Hi,

I am a new user of Rapidminer and I have less experience in data mining. I am looking for a way to select lets say the best 20 % of a customer base, based on the results of a decision tree, a neural net or something.

Is there an operator which is able to write a new table considering the result of a previous modeling operator? The ideal case would be an operator, which sets something like a scoring attribute so that I can generate a new table and manually select the top rated data sets.

I would be grateful for any help.

Dolf · Answer

Hi Marius, thank you very much. I have an another problem. In my example code I discretize attribute 1 creating 10 bins. If I switch to the results view and activate the bars chart with these 10 bins in the x-axis, the bars are listed in alphabetical order (range1, range10, range2...). If I use the replace operator and replace the values manually, e.g. range1 with 01, it replaces also range10 with 010 :-[ Is there a way to put the bars into correct order? Or better: is there a way to name the bins right from the start as I want (as I can do it in KNIME ;D)?

MariusHelf · Answer

Hi, in RapidMiner we separate the steps of model creation (training) and model application. From your description it seems that so far you only did the training step which results in a decision tree. Now you can apply that decision on new data. The result will be an example set with three additional attributes: the prediction (e.g. true or false), and so-called confidences. The confidence is a measure of how sure or confident the model is, that the input data is of a certain class. Please have a look at the attached process for a basic example of model training and application. In addition to the aforementioned operators, the process uses the Split operator to divide the input data into a set for training and a set for application. For a deeper understanding of RapidMiner's concept I would like to direct your attention to our video tutorials and other documentation resources on our website at http://rapid-i.com . You'll find all the documentation in the Documentation menu on top of the website. Best regards, Marius

Dolf · Answer

I'll try to exemplify my request. Let's say I have a data table containing several attributes about 1000 customers. Now I would like to know the probability, if they will buy a special product. I want to choose the best 20 % with the best response probability, based on a decision tree.

In cases like this KNIME decision tree gives me the option to append columns with normalized class distribution to the table, so that I can choose my best 20 % out of it. In contrast with KNIME, the decision tree operator of rapidminer delivers only a tree which doesn't solve my problem.