Regression with Random Forest ?

phivu
phivu New Altair Community Member
edited November 2024 in Community Q&A

Hi RapidMiner,

 

I'm doing regression with 480 input features. I tried to use Deep Learning operator but the training Root Mean Square Error is still quite high. Now I'm trying to use Random Forest because of its Random Subspace approach, but found that the Random Forest operator cannot handle numerical label. How can I deal with this?

 

Thank you very much for your support.

 

Best Regards,

phivu

Best Answers

  • earmijo
    earmijo New Altair Community Member
    Answer ✓

    You cannot do it in RapidMiner unless you are willing to use R Scripts. However, the latest version of RM has a new operator Gradient Boosted Trees which is competitive with Random Forest and it can handle both numerical and polynominal labels. Explore it. 

  • earmijo
    earmijo New Altair Community Member
    Answer ✓

    Install the R Script Extension. Verify you have R installed in your computer and run the code below. I adapted the code that comes with the application to run Random Forest for a regression problem.

     

    <?xml version="1.0" encoding="UTF-8"?><process version="7.3.001">
    <context>
    <input/>
    <output/>
    <macros/>
    </context>
    <operator activated="true" class="process" compatibility="7.3.001" expanded="true" name="Process">
    <process expanded="true">
    <operator activated="true" breakpoints="after" class="retrieve" compatibility="7.3.001" expanded="true" height="68" name="Retrieve Polynomial" width="90" x="45" y="34">
    <parameter key="repository_entry" value="//Samples/data/Polynomial"/>
    <description align="center" color="blue" colored="true" width="126">Fetch example data</description>
    </operator>
    <operator activated="true" class="split_data" compatibility="7.3.001" expanded="true" height="103" name="Split Data" width="90" x="179" y="34">
    <enumeration key="partitions">
    <parameter key="ratio" value="0.5"/>
    <parameter key="ratio" value="0.5"/>
    </enumeration>
    <description align="center" color="purple" colored="true" width="126">Split the data in a training and a test set</description>
    </operator>
    <operator activated="true" class="r_scripting:execute_r" compatibility="7.2.000" expanded="true" height="82" name="Learn Model" width="90" x="380" y="34">
    <parameter key="script" value="# train a random Forest on the training data and return the learned model&#10;&#10;rm_main = function(data)&#10;{&#10; library(randomForest) &#10;&#9;Model.rf &lt;- randomForest(label~., data =data,mtry=3,importance=FALSE,na.action=na.omit)&#10; &#9;return(Model.rf)&#10;}&#10;"/>
    <description align="center" color="red" colored="true" width="126">Train a RandomForest model in R and return it as an R object</description>
    </operator>
    <operator activated="true" class="r_scripting:execute_r" compatibility="7.2.000" expanded="true" height="103" name="Apply R Model" width="90" x="514" y="238">
    <parameter key="script" value="## load the trained model and apply it on the test data&#10;&#10;rm_main = function(model, data)&#10;{&#10; library(randomForest)&#10; # apply the model and build a prediction&#10; result &lt;-predict(model, data)&#10;&#10; # add the prediction to the example set&#10; data$prediction &lt;- result&#10; &#10; # update the meta data&#10; metaData$data$prediction &lt;&lt;- list(type=&quot;real&quot;, role=&quot;prediction&quot;)&#10; &#10; return(data)&#10;}&#10;"/>
    <description align="center" color="red" colored="true" width="126">Apply the trained model on the test data</description>
    </operator>
    <connect from_op="Retrieve Polynomial" from_port="output" to_op="Split Data" to_port="example set"/>
    <connect from_op="Split Data" from_port="partition 1" to_op="Learn Model" to_port="input 1"/>
    <connect from_op="Split Data" from_port="partition 2" to_op="Apply R Model" to_port="input 2"/>
    <connect from_op="Learn Model" from_port="output 1" to_op="Apply R Model" to_port="input 1"/>
    <connect from_op="Apply R Model" from_port="output 1" to_port="result 1"/>
    <portSpacing port="source_input 1" spacing="0"/>
    <portSpacing port="sink_result 1" spacing="0"/>
    <portSpacing port="sink_result 2" spacing="0"/>
    </process>
    </operator>
    </process>

Answers

  • earmijo
    earmijo New Altair Community Member
    Answer ✓

    You cannot do it in RapidMiner unless you are willing to use R Scripts. However, the latest version of RM has a new operator Gradient Boosted Trees which is competitive with Random Forest and it can handle both numerical and polynominal labels. Explore it. 

  • phivu
    phivu New Altair Community Member

    Thank you Earmijo, could you elaborate more on how to use RapidMiner with R to do regression with Random Forest?

  • earmijo
    earmijo New Altair Community Member
    Answer ✓

    Install the R Script Extension. Verify you have R installed in your computer and run the code below. I adapted the code that comes with the application to run Random Forest for a regression problem.

     

    <?xml version="1.0" encoding="UTF-8"?><process version="7.3.001">
    <context>
    <input/>
    <output/>
    <macros/>
    </context>
    <operator activated="true" class="process" compatibility="7.3.001" expanded="true" name="Process">
    <process expanded="true">
    <operator activated="true" breakpoints="after" class="retrieve" compatibility="7.3.001" expanded="true" height="68" name="Retrieve Polynomial" width="90" x="45" y="34">
    <parameter key="repository_entry" value="//Samples/data/Polynomial"/>
    <description align="center" color="blue" colored="true" width="126">Fetch example data</description>
    </operator>
    <operator activated="true" class="split_data" compatibility="7.3.001" expanded="true" height="103" name="Split Data" width="90" x="179" y="34">
    <enumeration key="partitions">
    <parameter key="ratio" value="0.5"/>
    <parameter key="ratio" value="0.5"/>
    </enumeration>
    <description align="center" color="purple" colored="true" width="126">Split the data in a training and a test set</description>
    </operator>
    <operator activated="true" class="r_scripting:execute_r" compatibility="7.2.000" expanded="true" height="82" name="Learn Model" width="90" x="380" y="34">
    <parameter key="script" value="# train a random Forest on the training data and return the learned model&#10;&#10;rm_main = function(data)&#10;{&#10; library(randomForest) &#10;&#9;Model.rf &lt;- randomForest(label~., data =data,mtry=3,importance=FALSE,na.action=na.omit)&#10; &#9;return(Model.rf)&#10;}&#10;"/>
    <description align="center" color="red" colored="true" width="126">Train a RandomForest model in R and return it as an R object</description>
    </operator>
    <operator activated="true" class="r_scripting:execute_r" compatibility="7.2.000" expanded="true" height="103" name="Apply R Model" width="90" x="514" y="238">
    <parameter key="script" value="## load the trained model and apply it on the test data&#10;&#10;rm_main = function(model, data)&#10;{&#10; library(randomForest)&#10; # apply the model and build a prediction&#10; result &lt;-predict(model, data)&#10;&#10; # add the prediction to the example set&#10; data$prediction &lt;- result&#10; &#10; # update the meta data&#10; metaData$data$prediction &lt;&lt;- list(type=&quot;real&quot;, role=&quot;prediction&quot;)&#10; &#10; return(data)&#10;}&#10;"/>
    <description align="center" color="red" colored="true" width="126">Apply the trained model on the test data</description>
    </operator>
    <connect from_op="Retrieve Polynomial" from_port="output" to_op="Split Data" to_port="example set"/>
    <connect from_op="Split Data" from_port="partition 1" to_op="Learn Model" to_port="input 1"/>
    <connect from_op="Split Data" from_port="partition 2" to_op="Apply R Model" to_port="input 2"/>
    <connect from_op="Learn Model" from_port="output 1" to_op="Apply R Model" to_port="input 1"/>
    <connect from_op="Apply R Model" from_port="output 1" to_port="result 1"/>
    <portSpacing port="source_input 1" spacing="0"/>
    <portSpacing port="sink_result 1" spacing="0"/>
    <portSpacing port="sink_result 2" spacing="0"/>
    </process>
    </operator>
    </process>
  • phivu
    phivu New Altair Community Member

    That's great, thanks!

  • CraigBostonUSA
    CraigBostonUSA New Altair Community Member

    UPDATE: As of version 8.0, Decision Tree and Random Forest can now handle numerical labels and solve regression problems.

     

    https://docs.rapidminer.com/latest/studio/releases/changes-8.0.0.html?_ga=2.83072976.793993492.1515416834-774805979.1445867999