"store a ROC plot for each iteration of a subprocess"
Legacy User
New Altair Community Member
Hello,
I routinely use ROC plots to compare different learning algorithms and parameters employed. When not using rapidMiner I generally dump multiple ROC plots to disk for every parameter value, feature selection round, etc.
I can't seem to find a way to do the same in RapidMiner. Is it possible?
I would prefer a solution that did not require to use the GUI, because even though I use it to design workflows, whenever I need to run RapidMiner on a full dataset I need to use it from the command line on a computing cluster.
Moreover, I usually prefer to store the raw data so I can then reproduce the plots in my graphic library of choice (R or matplotlib).
Hence, I was wondering if there was a way to automatically export or save to disk ROC plots (as images or even better as raw data)
For eg. in backward/forward attribute selection, I'd like to compare the ROC curve for every generation.
Things I have thought/tried so far:
- I don't see a 'write ROC' operator
- I tried using the 'write Performance' operator, but I find that RapidMiner cannot read the result file thus generated (neither opening it through the GUI or through the 'Read Performance' operator)
- I have thought of using 'write Performance' and then parse the resulting XML file via python outside of RapidMiner, but I still can't figure out how to write a separate file for every iteration of the subprocess. Is there a particular operator that can add a suffix to the filename and increment its value for every loop, or something similar?
Many thanks,
eli
I routinely use ROC plots to compare different learning algorithms and parameters employed. When not using rapidMiner I generally dump multiple ROC plots to disk for every parameter value, feature selection round, etc.
I can't seem to find a way to do the same in RapidMiner. Is it possible?
I would prefer a solution that did not require to use the GUI, because even though I use it to design workflows, whenever I need to run RapidMiner on a full dataset I need to use it from the command line on a computing cluster.
Moreover, I usually prefer to store the raw data so I can then reproduce the plots in my graphic library of choice (R or matplotlib).
Hence, I was wondering if there was a way to automatically export or save to disk ROC plots (as images or even better as raw data)
For eg. in backward/forward attribute selection, I'd like to compare the ROC curve for every generation.
Things I have thought/tried so far:
- I don't see a 'write ROC' operator
- I tried using the 'write Performance' operator, but I find that RapidMiner cannot read the result file thus generated (neither opening it through the GUI or through the 'Read Performance' operator)
- I have thought of using 'write Performance' and then parse the resulting XML file via python outside of RapidMiner, but I still can't figure out how to write a separate file for every iteration of the subprocess. Is there a particular operator that can add a suffix to the filename and increment its value for every loop, or something similar?
Many thanks,
eli
0
Answers
-
Hi Eli,
give the Reporting Extension a try. It offers a ReportGenerator to open a Report into various file formats. Then insert a Report operator to add a specific IOObject to the report. For example a plot of the roc chart.
Of course you can additionally add text for example describing the current parameter setting. Macros help you a big deal there.
Greetings,
Sebastian0 -
Hi Sebastian,
I tried using the report estension, but I cannot see an obvious way to output ROC curves (or their data)
Setting the report operator to expect anything except a Performance Vector returns an error. The performance vector however returns only confusion matrix and the value of AUC but no curve data.
Thanks,
eli0 -
Hi Eli,
give this process a try:<?xml version="1.0" encoding="UTF-8" standalone="no"?>
Greetings,
<process version="5.0">
<context>
<input/>
<output/>
<macros/>
</context>
<operator activated="true" class="process" compatibility="5.0.8" expanded="true" name="Process">
<process expanded="true" height="476" width="681">
<operator activated="true" class="generate_data" compatibility="5.0.8" expanded="true" height="60" name="Generate Data" width="90" x="45" y="30">
<parameter key="target_function" value="random classification"/>
</operator>
<operator activated="true" class="compare_rocs" compatibility="5.0.8" expanded="true" height="76" name="Compare ROCs" width="90" x="179" y="30">
<process expanded="true" height="608" width="894">
<operator activated="true" class="decision_tree" compatibility="5.0.8" expanded="true" height="76" name="Decision Tree" width="90" x="112" y="30"/>
<connect from_port="train 1" to_op="Decision Tree" to_port="training set"/>
<connect from_op="Decision Tree" from_port="model" to_port="model 1"/>
<portSpacing port="source_train 1" spacing="0"/>
<portSpacing port="source_train 2" spacing="0"/>
<portSpacing port="sink_model 1" spacing="0"/>
<portSpacing port="sink_model 2" spacing="0"/>
</process>
</operator>
<operator activated="true" class="reporting:generate_report" compatibility="5.0.2" expanded="true" height="76" name="Generate Report" width="90" x="313" y="30">
<parameter key="report_name" value="test"/>
<parameter key="pdf_output_file" value="c:\test.pdf"/>
</operator>
<operator activated="true" class="reporting:report" compatibility="5.0.2" expanded="true" height="60" name="Report" width="90" x="447" y="30">
<parameter key="specified" value="true"/>
<parameter key="reportable_type" value="ROC Comparison"/>
<parameter key="renderer_name" value="ROC Comparison"/>
<list key="parameters"/>
</operator>
<connect from_op="Generate Data" from_port="output" to_op="Compare ROCs" to_port="example set"/>
<connect from_op="Compare ROCs" from_port="rocComparison" to_op="Generate Report" to_port="through 1"/>
<connect from_op="Generate Report" from_port="through 1" to_op="Report" to_port="reportable in"/>
<connect from_op="Report" from_port="reportable out" to_port="result 1"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="sink_result 1" spacing="0"/>
<portSpacing port="sink_result 2" spacing="0"/>
</process>
</operator>
</process>
Sebastian0 -
Hi Sebastian,
thanks for the example process.
is the ROC comparison the only way to get out a ROC plot?
thanks,
eli0 -
Hi,
currently: Yes.
Greetings,
Sebastian0