Avoid the execution of all the processes and nodes everytime

f_laperna · October 2017

Hi, I'm new with Rapid Miner and I can't understand one thing. I built a process for Data Prep and now I'm working on another process for Classification. But everytime I want to run some nodes of the classification process also the initial Data Prep process need to run again from the beginning. Is it possible to, in some way, store the result of the previous running and execute only the classification process? Moreover, is it possible to do the same with nodes (for example the one reading thed dataset), and execute only the last nodes I added?

thank you!

sgenzer · October 2017

hello @f_laperna and welcome to the RapidMiner User Community. We are very happy you are here.

So yes, I would recommend using the "Store" operator to store your data prep example set so it does not need to run every time. Once you do this, you can use the "Retrieve" operator to grab that example set and keep using it for your classification.

If you need more help, please copy and paste your process (in XML) in this thread using the </> tool. It is often easier for us to help this way.

Good luck!

Scott

sgenzer · October 2017

hello @f_laperna and welcome to the RapidMiner User Community. We are very happy you are here.

So yes, I would recommend using the "Store" operator to store your data prep example set so it does not need to run every time. Once you do this, you can use the "Retrieve" operator to grab that example set and keep using it for your classification.

If you need more help, please copy and paste your process (in XML) in this thread using the </> tool. It is often easier for us to help this way.

Good luck!

Scott

f_laperna · October 2017

Thank you for your answer! I tried your solution but now when I run it I get an error "Input is missing". Following you can find the XML and a screenshot of the error.

<?xml version="1.0" encoding="UTF-8"?><process version="7.6.001">
  <context>
    <input/>
    <output/>
    <macros/>
  </context>
  <operator activated="true" class="process" compatibility="7.6.001" expanded="true" name="Process">
    <process expanded="true">
      <operator activated="true" class="split_data" compatibility="7.6.001" expanded="true" height="103" name="Split Data" width="90" x="179" y="136">
        <enumeration key="partitions">
          <parameter key="ratio" value="0.7"/>
          <parameter key="ratio" value="0.3"/>
        </enumeration>
        <parameter key="sampling_type" value="linear sampling"/>
      </operator>
      <operator activated="true" class="concurrency:parallel_random_forest" compatibility="7.6.001" expanded="true" height="82" name="TRAIN MODEL Random Forest" width="90" x="313" y="34"/>
      <operator activated="true" class="apply_model" compatibility="7.6.001" expanded="true" height="82" name="Apply Model" width="90" x="447" y="136">
        <list key="application_parameters"/>
      </operator>
      <operator activated="true" class="performance_classification" compatibility="7.6.001" expanded="true" height="82" name="Performance" width="90" x="648" y="34">
        <parameter key="main_criterion" value="classification_error"/>
        <parameter key="classification_error" value="true"/>
        <parameter key="root_mean_squared_error" value="true"/>
        <list key="class_weights"/>
      </operator>
      <operator activated="true" class="retrieve" compatibility="7.6.001" expanded="true" height="68" name="Retrieve Filtered_Data" width="90" x="45" y="85">
        <parameter key="repository_entry" value="../data/Filtered_Data"/>
      </operator>
      <connect from_op="Split Data" from_port="partition 1" to_op="TRAIN MODEL Random Forest" to_port="training set"/>
      <connect from_op="Split Data" from_port="partition 2" to_op="Apply Model" to_port="unlabelled data"/>
      <connect from_op="TRAIN MODEL Random Forest" from_port="model" to_op="Apply Model" to_port="model"/>
      <connect from_op="Apply Model" from_port="labelled data" to_op="Performance" to_port="labelled data"/>
      <connect from_op="Performance" from_port="performance" to_op="Performance" to_port="performance"/>
      <connect from_op="Performance" from_port="example set" to_port="result 1"/>
      <connect from_op="Retrieve Filtered_Data" from_port="output" to_op="Split Data" to_port="example set"/>
      <portSpacing port="source_input 1" spacing="0"/>
      <portSpacing port="sink_result 1" spacing="0"/>
      <portSpacing port="sink_result 2" spacing="0"/>
    </process>
  </operator>
</process>

f_laperna · October 2017

I solved by creating new nodes (simply copy-pastying the old ones) and connecting everything to the new nodes. Now it works fine

Telcontar120 · October 2017

You can also use breakpoints to run only part of a process and view the output up to that point---that can be helpful when building long processes. Right click on any operator and you will see the options to add breakpoints.

Avoid the execution of all the processes and nodes everytime

Best Answer

Answers

Categories