How to apply a Score dataset with no Target values to a model in RapidMiner
Seyhan
New Altair Community Member
Hi All,
I was wondering, if anybody know how to apply a score dataset to a data mining model in RapidMiner.
I can easy use train and test datasets for classification accuracy of a data mining model.
But I do not know how a score dataset with no values of target attribute.
I have a score dataset but do not know if there is a way to apply the score dataset to a DM model in Rapid. Unfortunately, I can not skip the score dataset part and must use it to make sure the model I created works well.
Regards,
Seyhan
:-\
I was wondering, if anybody know how to apply a score dataset to a data mining model in RapidMiner.
I can easy use train and test datasets for classification accuracy of a data mining model.
But I do not know how a score dataset with no values of target attribute.
I have a score dataset but do not know if there is a way to apply the score dataset to a DM model in Rapid. Unfortunately, I can not skip the score dataset part and must use it to make sure the model I created works well.
Regards,
Seyhan
:-\
Tagged:
0
Answers
-
Hi,
this should be quite simple: As long as the score dataset consists of the same regular attributes (special attributes like the label are not needed) you can simply feed it into a apply model operator. If you load your previously trained model into this operator, too, it will calculate the scores for you.
Here's a little example:<?xml version="1.0" encoding="UTF-8" standalone="no"?>
Greetings,
<process version="5.0">
<context>
<input/>
<output/>
<macros/>
</context>
<operator activated="true" class="process" expanded="true" name="Process">
<process expanded="true" height="370" width="614">
<operator activated="true" class="generate_data" expanded="true" height="60" name="Training Data" width="90" x="45" y="30"/>
<operator activated="true" class="generate_data" expanded="true" height="60" name="Score Data" width="90" x="112" y="255"/>
<operator activated="true" class="linear_regression" expanded="true" height="76" name="Linear Regression" width="90" x="246" y="30"/>
<operator activated="true" class="apply_model" expanded="true" height="76" name="Apply Model" width="90" x="514" y="30">
<list key="application_parameters"/>
</operator>
<connect from_op="Training Data" from_port="output" to_op="Linear Regression" to_port="training set"/>
<connect from_op="Score Data" from_port="output" to_op="Apply Model" to_port="unlabelled data"/>
<connect from_op="Linear Regression" from_port="model" to_op="Apply Model" to_port="model"/>
<connect from_op="Apply Model" from_port="labelled data" to_port="result 1"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="sink_result 1" spacing="0"/>
<portSpacing port="sink_result 2" spacing="0"/>
</process>
</operator>
</process>
Sebastian0 -
Thanks.
But I still could not run it xml you added. I am not an expert on RapidMiner and do not know half. But I added the xml of my model.
I will appreciate if you let me where to add score into the model applier, since there is no subsection of the applier.
Regards,
Seyhan
Code
<operator name="Root" class="Process" expanded="yes">
<operator name="CSVExampleSource" class="CSVExampleSource">
<parameter key="filename" value="C:\PAKDD2010\sample_modelV3.csv"/>
<parameter key="label_name" value="TARGET_LABEL"/>
</operator>
<operator name="Bootstrapping" class="Bootstrapping">
</operator>
<operator name="Nominal2Numerical" class="Nominal2Numerical">
</operator>
<operator name="XValidation" class="XValidation" expanded="yes">
<operator name="AdaBoost" class="AdaBoost" expanded="yes">
<operator name="KernelNaiveBayes" class="KernelNaiveBayes">
</operator>
</operator>
<operator name="OperatorChain" class="OperatorChain" expanded="yes">
<operator name="ModelApplier" class="ModelApplier">
<parameter key="keep_model" value="true"/>
<list key="application_parameters">
</list>
<parameter key="create_view" value="true"/>
</operator>
<operator name="Performance" class="Performance">
</operator>
<operator name="ResultWriter" class="ResultWriter">
<parameter key="result_file" value="G:\Rapping\model_results.csv"/>
</operator>
</operator>
</operator>
</operator>
0 -
Hi,
here's your modified process. You can use the model output of the XValidation to get a model trained on the complete data set that was forwarded into the X-Validation.
Please take a look at the XValidation and Adaboost. You forgot to connect the outputs of the operators with the subprocesses endpoints.
I would strongly recommend to take a look at all the sample processes delivered with RapidMiner and the videos linked on our website to get an understanding how RapidMiner works.<?xml version="1.0" encoding="UTF-8" standalone="no"?>
Greetings,
<process version="5.0">
<context>
<input/>
<output/>
<macros/>
</context>
<operator activated="true" class="process" expanded="true" name="Root">
<process expanded="true" height="460" width="966">
<operator activated="true" class="read_csv" expanded="true" height="60" name="CSVExampleSource" width="90" x="45" y="30"/>
<operator activated="true" class="set_role" expanded="true" height="76" name="Set Role" width="90" x="180" y="30">
<parameter key="name" value="TARGET_LABEL"/>
<parameter key="target_role" value="label"/>
</operator>
<operator activated="true" class="sample_bootstrapping" expanded="true" height="76" name="Bootstrapping" width="90" x="315" y="30"/>
<operator activated="true" class="nominal_to_numerical" expanded="true" height="94" name="Nominal2Numerical" width="90" x="450" y="30"/>
<operator activated="true" class="x_validation" expanded="true" height="112" name="XValidation" width="90" x="585" y="30">
<process expanded="true" height="460" width="458">
<operator activated="true" class="adaboost" expanded="true" height="76" name="AdaBoost" width="90" x="45" y="30">
<process expanded="true" height="460" width="966">
<operator activated="true" class="naive_bayes_kernel" expanded="true" height="76" name="KernelNaiveBayes" width="90" x="45" y="30"/>
<connect from_port="training set" to_op="KernelNaiveBayes" to_port="training set"/>
<connect from_op="KernelNaiveBayes" from_port="model" to_port="model"/>
<portSpacing port="source_training set" spacing="0"/>
<portSpacing port="sink_model" spacing="0"/>
</process>
</operator>
<connect from_port="training" to_op="AdaBoost" to_port="training set"/>
<connect from_op="AdaBoost" from_port="model" to_port="model"/>
<portSpacing port="source_training" spacing="0"/>
<portSpacing port="sink_model" spacing="0"/>
<portSpacing port="sink_through 1" spacing="0"/>
</process>
<process expanded="true" height="460" width="458">
<operator activated="true" class="apply_model" expanded="true" height="76" name="ModelApplier" width="90" x="45" y="30">
<list key="application_parameters"/>
<parameter key="create_view" value="true"/>
</operator>
<operator activated="true" class="performance" expanded="true" height="76" name="Performance" width="90" x="180" y="30"/>
<operator activated="true" class="write_as_text" expanded="true" height="76" name="ResultWriter" width="90" x="319" y="30">
<parameter key="result_file" value="G:\Rapping\model_results.csv"/>
</operator>
<connect from_port="test set" to_op="ModelApplier" to_port="unlabelled data"/>
<connect from_op="ModelApplier" from_port="labelled data" to_op="Performance" to_port="labelled data"/>
<connect from_op="Performance" from_port="performance" to_op="ResultWriter" to_port="input 1"/>
<connect from_op="ResultWriter" from_port="input 1" to_port="averagable 1"/>
<portSpacing port="source_model" spacing="0"/>
<portSpacing port="source_test set" spacing="0"/>
<portSpacing port="source_through 1" spacing="0"/>
<portSpacing port="sink_averagable 1" spacing="0"/>
<portSpacing port="sink_averagable 2" spacing="0"/>
</process>
</operator>
<operator activated="true" class="read_csv" expanded="true" height="60" name="Score Set" width="90" x="45" y="165"/>
<operator activated="true" class="apply_model" expanded="true" height="76" name="Apply Model" width="90" x="782" y="165">
<list key="application_parameters"/>
</operator>
<connect from_op="CSVExampleSource" from_port="output" to_op="Set Role" to_port="example set input"/>
<connect from_op="Set Role" from_port="example set output" to_op="Bootstrapping" to_port="example set input"/>
<connect from_op="Bootstrapping" from_port="example set output" to_op="Nominal2Numerical" to_port="example set input"/>
<connect from_op="Nominal2Numerical" from_port="example set output" to_op="XValidation" to_port="training"/>
<connect from_op="XValidation" from_port="model" to_op="Apply Model" to_port="model"/>
<connect from_op="XValidation" from_port="averagable 1" to_port="result 1"/>
<connect from_op="Score Set" from_port="output" to_op="Apply Model" to_port="unlabelled data"/>
<connect from_op="Apply Model" from_port="labelled data" to_port="result 2"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="sink_result 1" spacing="0"/>
<portSpacing port="sink_result 2" spacing="0"/>
<portSpacing port="sink_result 3" spacing="0"/>
</process>
</operator>
</process>
Sebastian0