[SOLVED] Write results in different files automatically?
T-Unit
New Altair Community Member
Hi everyone,
i'm doing some clustering and want to bring up some cluster-models (k-medeoids) using different model-parameters (number of clusters, max runs, max optimization steps, ...). Doing so i used the "optimize"-Operator to generate several cluster-models using different parameters. I need the clustered data for further analytics (doesn't matter if the used parameter combinations are perfect or not) so i use the "write excel"-operator to extract the generated data into an excel sheet. But doing so i only get the clustered data of the first run (eg. when k was 2) into the final excel file. In the "optimize"-operator i tell the process it should change (for example) the number of clusters from k= 2 to 20.
My Question:
Is is it possible to change the name of the Output-File automatically during the process is running?
I mean it this way:
choose k=2 --> do the clustering --> save the results to file named "results_k_2.xls"
choose k=3 --> do the clustering --> save the results to file named "results_k_3.xls"
...
choose k= 20 --> do the clustering --> save the results to file named "results_k_20.xls"
Thanks for help.
Greetings,
Thomas
i'm doing some clustering and want to bring up some cluster-models (k-medeoids) using different model-parameters (number of clusters, max runs, max optimization steps, ...). Doing so i used the "optimize"-Operator to generate several cluster-models using different parameters. I need the clustered data for further analytics (doesn't matter if the used parameter combinations are perfect or not) so i use the "write excel"-operator to extract the generated data into an excel sheet. But doing so i only get the clustered data of the first run (eg. when k was 2) into the final excel file. In the "optimize"-operator i tell the process it should change (for example) the number of clusters from k= 2 to 20.
My Question:
Is is it possible to change the name of the Output-File automatically during the process is running?
I mean it this way:
choose k=2 --> do the clustering --> save the results to file named "results_k_2.xls"
choose k=3 --> do the clustering --> save the results to file named "results_k_3.xls"
...
choose k= 20 --> do the clustering --> save the results to file named "results_k_20.xls"
Thanks for help.
Greetings,
Thomas
Tagged:
0
Answers
-
Hello,
You can use macros to do this. If you have a macro containing k then you could create another from it containing the filename you want and use that as the parameter to the write excel operator.
Regards
Andrew0 -
Hello Andrew,
first of all thanks for your fast reply.
Your idea sounds logical to me but - to be honest - i don't have any glue how to work with macros in rapidminer. Neither I know how and where to define them nor how to use them in the process. Maybe you can give a recommendation to a website where working with marcos in rapidminer is (detailed) explained? Your blog from september 15th gives a short look on what the macro can be used for but i can't implement this to my process.
Regards,
Thomas0 -
Hello
You could modify this example
http://rapidminernotes.blogspot.co.uk/2012/07/chopping-files-into-smaller-bits.html
regards
Andrew0 -
Macros are some kind of named variables you can set and use everywhere in the process. To set a macro there are two ways:
- In the context tab of your process
- With the macro operators in Utility/Macros (see the help tab for usage)
0 -
Hello Marcin,
i implemented - using the "Set macro"-Operator - a macro called "k". How can i give this Parameter the value of the actual count of clusters of "Cluster"-Operator (the count of clusters is set by the "optimize Parameter"-Operator and changes from 2 to 20)? I tried "operator.Clustering.parameter.k" but this didn't work properly. Instead of different files of the kind "results_k_2.xls", "results_k_3.xls", ... i got only one file named "results_k_operator.Clustering.parameter.k.xls". Maybe it's impossible to direct access to the value of a models parameters?
Regards,
Thomas0 -
I thought that there is a predefined macro for this but i was wrong. So unfortunately there is no easy way to do this, but a hack. You can log the parameter of an operator, transform it to an example set and extract a macro from the last example (-1) from this example set. Here is an example process.
Please note that the "Extract Macro" operator has to be executed before you use the macro (click on the blue double-arrow with the question mark to check and alter the execution order).
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<process version="5.2.009">
<context>
<input/>
<output/>
<macros/>
</context>
<operator activated="true" class="process" compatibility="5.2.009" expanded="true" name="Process">
<process expanded="true" height="520" width="643">
<operator activated="true" class="retrieve" compatibility="5.2.009" expanded="true" height="60" name="Retrieve" width="90" x="45" y="75">
<parameter key="repository_entry" value="//Samples/data/Iris"/>
</operator>
<operator activated="true" class="optimize_parameters_grid" compatibility="5.2.009" expanded="true" height="94" name="Optimize Parameters (Grid)" width="90" x="246" y="75">
<list key="parameters">
<parameter key="Clustering.k" value="[2.0;20;19;linear]"/>
</list>
<process expanded="true" height="538" width="643">
<operator activated="true" class="k_means" compatibility="5.2.009" expanded="true" height="76" name="Clustering" width="90" x="45" y="30">
<parameter key="k" value="20"/>
</operator>
<operator activated="true" class="apply_model" compatibility="5.2.009" expanded="true" height="76" name="Apply Model" width="90" x="179" y="30">
<list key="application_parameters"/>
</operator>
<operator activated="true" class="log" compatibility="5.2.009" expanded="true" height="76" name="Log" width="90" x="45" y="210">
<list key="log">
<parameter key="k" value="operator.Clustering.parameter.k"/>
</list>
</operator>
<operator activated="true" class="log_to_data" compatibility="5.2.009" expanded="true" height="94" name="Log to Data" width="90" x="179" y="210"/>
<operator activated="true" class="write_csv" compatibility="5.2.009" expanded="true" height="76" name="Write CSV" width="90" x="380" y="300">
<parameter key="csv_file" value="/home/marcin/temp/result_k_%{k}.csv"/>
</operator>
<operator activated="true" class="extract_macro" compatibility="5.2.009" expanded="true" height="60" name="Extract Macro" width="90" x="313" y="165">
<parameter key="macro" value="k"/>
<parameter key="macro_type" value="data_value"/>
<parameter key="attribute_name" value="k"/>
<parameter key="example_index" value="-1"/>
<list key="additional_macros"/>
</operator>
<operator activated="true" class="performance" compatibility="5.2.009" expanded="true" height="76" name="Performance" width="90" x="514" y="300"/>
<connect from_port="input 1" to_op="Clustering" to_port="example set"/>
<connect from_op="Clustering" from_port="cluster model" to_op="Apply Model" to_port="model"/>
<connect from_op="Clustering" from_port="clustered set" to_op="Apply Model" to_port="unlabelled data"/>
<connect from_op="Apply Model" from_port="labelled data" to_op="Log" to_port="through 1"/>
<connect from_op="Log" from_port="through 1" to_op="Log to Data" to_port="through 1"/>
<connect from_op="Log to Data" from_port="exampleSet" to_op="Extract Macro" to_port="example set"/>
<connect from_op="Log to Data" from_port="through 1" to_op="Write CSV" to_port="input"/>
<connect from_op="Write CSV" from_port="through" to_op="Performance" to_port="labelled data"/>
<connect from_op="Performance" from_port="performance" to_port="performance"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="source_input 2" spacing="0"/>
<portSpacing port="sink_performance" spacing="0"/>
<portSpacing port="sink_result 1" spacing="0"/>
</process>
</operator>
<connect from_op="Retrieve" from_port="output" to_op="Optimize Parameters (Grid)" to_port="input 1"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="sink_result 1" spacing="0"/>
</process>
</operator>
</process>0 -
Thanks for your fast reply and your suggestion, Marcin!
I crawled around the forum an found a thread that helped me to solve my problem (the hint of Sebastian Land is it):
http://rapid-i.com/rapidforum/index.php/topic,1014.0.html
So here is my adaption:
I put a "Clone Parameters"-operator after the cluster-operator. The clone-operator is connected to the "set macro"-operator. In the "Clone Parameters"-operator i filled in the following:
source: Clustering.k
target: Set Macro.value
So the changing value of k is copied to the value for the macro and the macro is later used to generate the different filenames (results_k_2.xls, results_k_3.xls, and so on).
The solution is kinda simple but I assure, that i would have never solve this problem by myself (or even expect that the "clone parameters"-operator would do it). Hope this will help other users with the same problem.
Regards and thanks to all who tried to help me,
Thomas0