Generating new data out of modeling results
New Altair Community Member
I am a new user of Rapidminer and I have less experience in data mining. I am looking for a way to select lets say the best 20 % of a customer base, based on the results of a decision tree, a neural net or something.
Is there an operator which is able to write a new table considering the result of a previous modeling operator? The ideal case would be an operator, which sets something like a scoring attribute so that I can generate a new table and manually select the top rated data sets.
I would be grateful for any help.
I am a new user of Rapidminer and I have less experience in data mining. I am looking for a way to select lets say the best 20 % of a customer base, based on the results of a decision tree, a neural net or something.
Is there an operator which is able to write a new table considering the result of a previous modeling operator? The ideal case would be an operator, which sets something like a scoring attribute so that I can generate a new table and manually select the top rated data sets.
I would be grateful for any help.
I'll try to exemplify my request. Let's say I have a data table containing several attributes about 1000 customers. Now I would like to know the probability, if they will buy a special product. I want to choose the best 20 % with the best response probability, based on a decision tree.
In cases like this KNIME decision tree gives me the option to append columns with normalized class distribution to the table, so that I can choose my best 20 % out of it. In contrast with KNIME, the decision tree operator of rapidminer delivers only a tree which doesn't solve my problem.0 -
in RapidMiner we separate the steps of model creation (training) and model application. From your description it seems that so far you only did the training step which results in a decision tree. Now you can apply that decision on new data.
The result will be an example set with three additional attributes: the prediction (e.g. true or false), and so-called confidences. The confidence is a measure of how sure or confident the model is, that the input data is of a certain class.
Please have a look at the attached process for a basic example of model training and application. In addition to the aforementioned operators, the process uses the Split operator to divide the input data into a set for training and a set for application.
For a deeper understanding of RapidMiner's concept I would like to direct your attention to our video tutorials and other documentation resources on our website at . You'll find all the documentation in the Documentation menu on top of the website.
Best regards,
Marius<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<process version="5.3.000">
<operator activated="true" class="process" compatibility="5.3.000" expanded="true" name="Process">
<process expanded="true" height="549" width="567">
<operator activated="true" class="retrieve" compatibility="5.3.000" expanded="true" height="60" name="Retrieve Sonar" width="90" x="45" y="120">
<parameter key="repository_entry" value="//Samples/data/Sonar"/>
<operator activated="true" class="split_data" compatibility="5.3.000" expanded="true" height="94" name="Split Data" width="90" x="179" y="120">
<enumeration key="partitions">
<parameter key="ratio" value="0.7"/>
<parameter key="ratio" value="0.3"/>
<operator activated="true" class="decision_tree" compatibility="5.3.000" expanded="true" height="76" name="Decision Tree" width="90" x="313" y="30"/>
<operator activated="true" class="apply_model" compatibility="5.3.000" expanded="true" height="76" name="Apply Model" width="90" x="447" y="120">
<list key="application_parameters"/>
<connect from_op="Retrieve Sonar" from_port="output" to_op="Split Data" to_port="example set"/>
<connect from_op="Split Data" from_port="partition 1" to_op="Decision Tree" to_port="training set"/>
<connect from_op="Split Data" from_port="partition 2" to_op="Apply Model" to_port="unlabelled data"/>
<connect from_op="Decision Tree" from_port="model" to_op="Apply Model" to_port="model"/>
<connect from_op="Apply Model" from_port="labelled data" to_port="result 1"/>
<connect from_op="Apply Model" from_port="model" to_port="result 2"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="sink_result 1" spacing="90"/>
<portSpacing port="sink_result 2" spacing="0"/>
<portSpacing port="sink_result 3" spacing="0"/>
</process>0 -
Hi Marius,
thank you very much.
I have an another problem. In my example code I discretize attribute 1 creating 10 bins. If I switch to the results view and activate the bars chart with these 10 bins in the x-axis, the bars are listed in alphabetical order (range1, range10, range2...). If I use the replace operator and replace the values manually, e.g. range1 with 01, it replaces also range10 with 010 :-[
Is there a way to put the bars into correct order? Or better: is there a way to name the bins right from the start as I want (as I can do it in KNIME ;D)?<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<process version="5.2.008">
<operator activated="true" class="process" compatibility="5.2.008" expanded="true" name="Process">
<process expanded="true" height="549" width="681">
<operator activated="true" class="retrieve" compatibility="5.2.008" expanded="true" height="60" name="Retrieve Sonar" width="90" x="45" y="210">
<parameter key="repository_entry" value="//Samples/data/Sonar"/>
<operator activated="true" class="discretize_by_bins" compatibility="5.2.008" expanded="true" height="94" name="Discretize" width="90" x="246" y="210">
<parameter key="attribute_filter_type" value="single"/>
<parameter key="attribute" value="attribute_1"/>
<parameter key="number_of_bins" value="10"/>
<connect from_op="Retrieve Sonar" from_port="output" to_op="Discretize" to_port="example set input"/>
<connect from_op="Discretize" from_port="example set output" to_port="result 1"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="sink_result 1" spacing="0"/>
<portSpacing port="sink_result 2" spacing="0"/>