Importing Example from RapidMiner book - XML
I try to reproduce the process of Chapter 11 from http://rapidminerbook.com/index.php/chapter-downloads/chapter-11/; I have saved Dataset & Processes as .xml in My Documents. From File in RapidMiner Studio, I try to import the XML process but RapidMiner can't find them. What am I doing wrong? Thanks in advance for your support.
Answers
-
Hi @Maerkli,
1. You have to open your XML files in a notebook and then copy the whole code.(Ctrl + A then Ctrl + C)
2. In RapidMiner, you have to activate your XML panel :
uju
3. Delete the existing code and paste your XML code in the XML panel.
4. Click on the "check" button.
5. That's it... the process appears in the main window.
I hope it helps,
Regards,
Lionel
1 -
Hi again @Maerkli,
I wanted to say :
"1. You have to open your XML files in a notepad (for example "Wordpad", "Bloc note") and then copy the whole code."
Regards,
Lionel
0 -
Thanks, Lionel. I had already tried this way. Question: how, from this point, do I get the process deployed in Process screen? I have the xml code in the XML screen.
Maerkli
0 -
@Maerkli you should click the green check mark. If there are no errors in the XML, the operators should populate in the Process view.
1 -
Hallo Thomas,
That is the point. The green mark is checked. The operators do not populate the Process view. The code used is exactely taken
from http://rapidminerbook.com/index.php/chapter-downloads/chapter-11/ , edited by Dr. Markus Hofmann & Ralf Klinkenberg.
Maerkli
0 -
hello @Maerkli welcome to the community! Some quick recommendations for you:
• Post your XML process here in this thread (see https://youtu.be/KkgB5QXWXJ8 and "Read Before Posting" on right when you reply)
• Attach your dataset if possible (use a fictionalized version if there are privacy concerns)
• Make sure you have all necessary extensions installed (see https://youtu.be/pjBqG3xtXx4)
Scott0 -
Hallo Scott,
Before posting, I have read the recommendations several times. I have attached the xml files in question as well.
Thanks for the support.
Maerkli
0 -
@Maerkli I just loaded the cluster one fine. See attached RMP file. Haven't checked the other one.
1 -
Thomas,
I have tried with your enclosed file and it works. I note that you have a .rmp file. That' s perhaps the explication. From
http://rapidminerbook.com/index.php/chapter-downloads/chapter-11/, how can I get a .rmp and not an .xml?
Maerkli
0 -
@Maerkli I just downloaded the zip file, extracted it, opened the cluster XML file using a text editor, copy and pasted it into the XML view, and clicked the green check mark. Everything populated.
If it's not working for you, check that you copy and pasted the entire XML.
1 -
Thomas,
I do exactely that but the xml file does not populate the process window. If I copy/paste your rpm file, it works. Really mysterious!
Maerkli
PS. Shall I look the Champion's League match Real against Juventus or spend my night trying to solve this issue?
2 -
That's mysterious indeed, but many things :
1. You can try to quit and re-open RapidMiner
2. Update RapidMiner to the last version
3. Can you repeat the procedure (copy/paste in the XML panel after deleting the existing code, then click on the check button) with these XML code ( your 2 XML files) :
<?xml version="1.0" encoding="UTF-8"?><process version="8.1.001">
<context>
<input/>
<output/>
<macros>
<macro>
<key>numberOfClusterIterations</key>
<value>15</value>
</macro>
<macro>
<key>processToRun</key>
<value>readAndProcessEcoliData</value>
</macro>
<macro>
<key>locationOfData</key>
<value>PathToYourData\ecoli.data</value>
</macro>
</macros>
</context>
<operator activated="true" class="process" compatibility="6.0.002" expanded="true" name="Process">
<process expanded="true">
<operator activated="true" class="generate_data" compatibility="7.1.001" expanded="true" height="68" name="Generate data for testing" width="90" x="179" y="30">
<parameter key="target_function" value="gaussian mixture clusters"/>
<parameter key="number_of_attributes" value="3"/>
<parameter key="use_local_random_seed" value="true"/>
<parameter key="local_random_seed" value="2"/>
</operator>
<operator activated="true" class="rename" compatibility="8.1.001" expanded="true" height="82" name="Rename" width="90" x="179" y="120">
<parameter key="old_name" value="label"/>
<parameter key="new_name" value="site"/>
<list key="rename_additional_attributes"/>
</operator>
<operator activated="true" class="productivity:execute_process" compatibility="6.0.002" expanded="true" height="82" name="Execute Process" width="90" x="45" y="30">
<parameter key="process_location" value="%{processToRun}"/>
<parameter key="use_input" value="false"/>
<parameter key="store_output" value="true"/>
<parameter key="propagate_metadata_recursively" value="false"/>
<parameter key="cache_process" value="false"/>
<list key="macros">
<parameter key="fileToRead" value="%{locationOfData}"/>
</list>
</operator>
<operator activated="true" class="select_attributes" compatibility="8.1.001" expanded="true" height="82" name="Select Attributes" width="90" x="45" y="120">
<parameter key="attribute_filter_type" value="subset"/>
<parameter key="attributes" value="|sequenceName"/>
<parameter key="invert_selection" value="true"/>
<parameter key="include_special_attributes" value="true"/>
</operator>
<operator activated="true" class="loop" compatibility="8.1.001" expanded="true" height="82" name="Generate clusters" width="90" x="45" y="255">
<parameter key="set_iteration_macro" value="true"/>
<parameter key="macro_start_value" value="2"/>
<parameter key="iterations" value="%{numberOfClusterIterations}"/>
<process expanded="true">
<operator activated="true" class="materialize_data" compatibility="8.1.001" expanded="true" height="76" name="Materialize Data" width="90" x="45" y="30"/>
<operator activated="true" class="k_means" compatibility="8.1.001" expanded="true" height="76" name="Clustering" width="90" x="179" y="30">
<parameter key="k" value="%{iteration}"/>
<parameter key="measure_types" value="NumericalMeasures"/>
</operator>
<operator activated="true" class="remember" compatibility="8.1.001" expanded="true" height="60" name="Remember: clusters" width="90" x="313" y="30">
<parameter key="name" value="%{iteration}_model"/>
<parameter key="io_object" value="CentroidClusterModel"/>
</operator>
<operator activated="true" class="remember" compatibility="8.1.001" expanded="true" height="60" name="Remember: example set" width="90" x="313" y="120">
<parameter key="name" value="%{iteration}"/>
</operator>
<connect from_port="input 1" to_op="Materialize Data" to_port="example set input"/>
<connect from_op="Materialize Data" from_port="example set output" to_op="Clustering" to_port="example set"/>
<connect from_op="Clustering" from_port="cluster model" to_op="Remember: clusters" to_port="store"/>
<connect from_op="Clustering" from_port="clustered set" to_op="Remember: example set" to_port="store"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="source_input 2" spacing="0"/>
<portSpacing port="sink_output 1" spacing="0"/>
<portSpacing port="sink_output 2" spacing="0"/>
</process>
</operator>
<operator activated="true" class="optimize_parameters_grid" compatibility="6.0.003" expanded="true" height="103" name="Generate ground truth measures" width="90" x="45" y="390">
<list key="parameters">
<parameter key="Recall 1st example.name" value="2,3,4,5,6,7,8,9,10,11,12,13,14,15,16"/>
</list>
<process expanded="true">
<operator activated="true" class="recall" compatibility="8.1.001" expanded="true" height="60" name="Recall 1st example" width="90" x="45" y="30">
<parameter key="name" value="16"/>
<parameter key="remove_from_store" value="false"/>
</operator>
<operator activated="true" class="set_role" compatibility="5.3.013" expanded="true" height="76" name="Set cluster role regular" width="90" x="180" y="30">
<parameter key="attribute_name" value="cluster"/>
<list key="set_additional_roles"/>
</operator>
<operator activated="true" class="rename" compatibility="8.1.001" expanded="true" height="76" name="Rename cluster to cluster1" width="90" x="313" y="30">
<parameter key="old_name" value="cluster"/>
<parameter key="new_name" value="cluster1"/>
<list key="rename_additional_attributes"/>
</operator>
<operator activated="true" class="select_attributes" compatibility="8.1.001" expanded="true" height="76" name="Select cluster1 and id" width="90" x="447" y="30">
<parameter key="attribute_filter_type" value="subset"/>
<parameter key="attributes" value="cluster1||id"/>
<parameter key="include_special_attributes" value="true"/>
</operator>
<operator activated="true" class="recall" compatibility="8.1.001" expanded="true" height="60" name="Recall ground truth" width="90" x="45" y="255">
<parameter key="name" value="8"/>
<parameter key="remove_from_store" value="false"/>
<description align="center" color="transparent" colored="false" width="126">For both data sets, the ground truth is 8 clusters - for other data sets, this number could be different. The recalled example set is the 8th but it does not have to be - the important point is renaming the ground truth cluster indicator - site - to cluster2 later.</description>
</operator>
<operator activated="true" class="set_role" compatibility="5.3.013" expanded="true" height="76" name="Set site role regular" width="90" x="179" y="255">
<parameter key="attribute_name" value="site"/>
<list key="set_additional_roles"/>
</operator>
<operator activated="true" class="rename" compatibility="8.1.001" expanded="true" height="76" name="Rename site to cluster2" width="90" x="313" y="255">
<parameter key="old_name" value="site"/>
<parameter key="new_name" value="cluster2"/>
<list key="rename_additional_attributes"/>
<description align="center" color="transparent" colored="false" width="126">This renames the ground truth cluster which is called site to cluster2</description>
</operator>
<operator activated="true" class="select_attributes" compatibility="8.1.001" expanded="true" height="76" name="Select cluster2 and id" width="90" x="447" y="255">
<parameter key="attribute_filter_type" value="subset"/>
<parameter key="attributes" value="cluster2||id"/>
<parameter key="include_special_attributes" value="true"/>
</operator>
<operator activated="true" class="join" compatibility="8.1.001" expanded="true" height="76" name="Join cluster1 and cluster2" width="90" x="581" y="120">
<list key="key_attributes"/>
</operator>
<operator activated="true" class="subprocess" compatibility="8.1.001" expanded="true" height="76" name="Calculate groundTruthClusterValidityIndices" width="90" x="715" y="120">
<process expanded="true">
<operator activated="true" class="dummy" compatibility="8.1.001" expanded="true" height="76" name="R script: calculate validity indices" width="90" x="45" y="30">
<description align="center" color="transparent" colored="false" width="126">Comment out these lines after the first run using ##
<br>
<br>
##install.packages("mclust")
##install.packages("profdpm")
</description>
</operator>
<operator activated="true" class="subprocess" compatibility="8.1.001" expanded="true" height="76" name="Extract ground truth performance measures" width="90" x="179" y="30">
<process expanded="true">
<operator activated="true" class="extract_performance" compatibility="8.1.001" expanded="true" height="76" name="ARI (2)" width="90" x="112" y="30">
<parameter key="performance_type" value="data_value"/>
<parameter key="attribute_name" value="ARI"/>
<parameter key="example_index" value="1"/>
</operator>
<operator activated="true" class="extract_performance" compatibility="8.1.001" expanded="true" height="76" name="FM (2)" width="90" x="112" y="165">
<parameter key="performance_type" value="data_value"/>
<parameter key="attribute_name" value="FM"/>
<parameter key="example_index" value="1"/>
</operator>
<operator activated="true" class="extract_performance" compatibility="8.1.001" expanded="true" height="76" name="R (2)" width="90" x="313" y="165">
<parameter key="performance_type" value="data_value"/>
<parameter key="attribute_name" value="R"/>
<parameter key="example_index" value="1"/>
</operator>
<operator activated="true" class="extract_performance" compatibility="8.1.001" expanded="true" height="76" name="W10 (2)" width="90" x="447" y="165">
<parameter key="performance_type" value="data_value"/>
<parameter key="attribute_name" value="W10"/>
<parameter key="example_index" value="1"/>
</operator>
<operator activated="true" class="extract_performance" compatibility="8.1.001" expanded="true" height="76" name="W01 (2)" width="90" x="581" y="165">
<parameter key="performance_type" value="data_value"/>
<parameter key="attribute_name" value="W01"/>
<parameter key="example_index" value="1"/>
</operator>
<operator activated="true" class="extract_performance" compatibility="8.1.001" expanded="true" height="76" name="J (2)" width="90" x="715" y="165">
<parameter key="performance_type" value="data_value"/>
<parameter key="attribute_name" value="J"/>
<parameter key="example_index" value="1"/>
</operator>
<connect from_port="in 1" to_op="ARI (2)" to_port="example set"/>
<connect from_op="ARI (2)" from_port="performance" to_port="out 1"/>
<connect from_op="ARI (2)" from_port="example set" to_op="FM (2)" to_port="example set"/>
<connect from_op="FM (2)" from_port="example set" to_op="R (2)" to_port="example set"/>
<connect from_op="R (2)" from_port="example set" to_op="W10 (2)" to_port="example set"/>
<connect from_op="W10 (2)" from_port="example set" to_op="W01 (2)" to_port="example set"/>
<connect from_op="W01 (2)" from_port="example set" to_op="J (2)" to_port="example set"/>
<portSpacing port="source_in 1" spacing="0"/>
<portSpacing port="source_in 2" spacing="0"/>
<portSpacing port="sink_out 1" spacing="0"/>
<portSpacing port="sink_out 2" spacing="0"/>
</process>
</operator>
<operator activated="true" class="log" compatibility="8.1.001" expanded="true" height="76" name="Log: Ground" width="90" x="313" y="30">
<list key="log">
<parameter key="ARI" value="operator.ARI (2).value.performance"/>
<parameter key="FM" value="operator.FM (2).value.performance"/>
<parameter key="J" value="operator.J (2).value.performance"/>
<parameter key="R" value="operator.R (2).value.performance"/>
<parameter key="W01" value="operator.W01 (2).value.performance"/>
<parameter key="W10" value="operator.W10 (2).value.performance"/>
<parameter key="k1" value="operator.Recall 1st example.parameter.name"/>
<parameter key="k2" value="operator.Recall ground truth.parameter.name"/>
</list>
</operator>
<connect from_op="Extract ground truth performance measures" from_port="out 1" to_op="Log: Ground" to_port="through 1"/>
<connect from_op="Log: Ground" from_port="through 1" to_port="out 1"/>
<portSpacing port="source_in 1" spacing="0"/>
<portSpacing port="source_in 2" spacing="0"/>
<portSpacing port="sink_out 1" spacing="0"/>
<portSpacing port="sink_out 2" spacing="0"/>
</process>
</operator>
<connect from_op="Recall 1st example" from_port="result" to_op="Set cluster role regular" to_port="example set input"/>
<connect from_op="Set cluster role regular" from_port="example set output" to_op="Rename cluster to cluster1" to_port="example set input"/>
<connect from_op="Rename cluster to cluster1" from_port="example set output" to_op="Select cluster1 and id" to_port="example set input"/>
<connect from_op="Select cluster1 and id" from_port="example set output" to_op="Join cluster1 and cluster2" to_port="left"/>
<connect from_op="Recall ground truth" from_port="result" to_op="Set site role regular" to_port="example set input"/>
<connect from_op="Set site role regular" from_port="example set output" to_op="Rename site to cluster2" to_port="example set input"/>
<connect from_op="Rename site to cluster2" from_port="example set output" to_op="Select cluster2 and id" to_port="example set input"/>
<connect from_op="Select cluster2 and id" from_port="example set output" to_op="Join cluster1 and cluster2" to_port="right"/>
<connect from_op="Join cluster1 and cluster2" from_port="join" to_op="Calculate groundTruthClusterValidityIndices" to_port="in 1"/>
<connect from_op="Calculate groundTruthClusterValidityIndices" from_port="out 1" to_port="performance"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="source_input 2" spacing="0"/>
<portSpacing port="sink_performance" spacing="0"/>
<portSpacing port="sink_result 1" spacing="0"/>
</process>
</operator>
<operator activated="true" class="log_to_data" compatibility="8.1.001" expanded="true" height="124" name="Ground truth measures" width="90" x="179" y="390">
<parameter key="log_name" value="Log: Ground"/>
</operator>
<operator activated="true" class="guess_types" compatibility="7.1.001" expanded="true" height="82" name="Ground" width="90" x="313" y="390"/>
<operator activated="true" class="optimize_parameters_grid" compatibility="6.0.003" expanded="true" height="103" name="Generate external measures" width="90" x="45" y="525">
<list key="parameters">
<parameter key="Recall first.name" value="2,3,4,5,6,7,8,9,10,11,12,13,14,15,16"/>
<parameter key="Recall second.name" value="2,3,4,5,6,7,8,9,10,11,12,13,14,15,16"/>
</list>
<process expanded="true">
<operator activated="true" class="recall" compatibility="8.1.001" expanded="true" height="60" name="Recall first" width="90" x="45" y="75">
<parameter key="name" value="16"/>
<parameter key="remove_from_store" value="false"/>
</operator>
<operator activated="true" class="set_role" compatibility="5.3.013" expanded="true" height="76" name="set first cluster role" width="90" x="179" y="75">
<parameter key="attribute_name" value="cluster"/>
<list key="set_additional_roles"/>
</operator>
<operator activated="true" class="rename" compatibility="8.1.001" expanded="true" height="76" name="rename first cluster" width="90" x="313" y="75">
<parameter key="old_name" value="cluster"/>
<parameter key="new_name" value="cluster1"/>
<list key="rename_additional_attributes"/>
</operator>
<operator activated="true" class="select_attributes" compatibility="8.1.001" expanded="true" height="76" name="select first cluster and id" width="90" x="447" y="75">
<parameter key="attribute_filter_type" value="subset"/>
<parameter key="attributes" value="cluster1||id"/>
<parameter key="include_special_attributes" value="true"/>
</operator>
<operator activated="true" class="recall" compatibility="8.1.001" expanded="true" height="60" name="Recall second" width="90" x="45" y="255">
<parameter key="name" value="16"/>
<parameter key="remove_from_store" value="false"/>
</operator>
<operator activated="true" class="set_role" compatibility="5.3.013" expanded="true" height="76" name="set second cluster role" width="90" x="179" y="255">
<parameter key="attribute_name" value="cluster"/>
<list key="set_additional_roles"/>
</operator>
<operator activated="true" class="rename" compatibility="8.1.001" expanded="true" height="76" name="rename second cluster" width="90" x="313" y="255">
<parameter key="old_name" value="cluster"/>
<parameter key="new_name" value="cluster2"/>
<list key="rename_additional_attributes"/>
</operator>
<operator activated="true" class="select_attributes" compatibility="8.1.001" expanded="true" height="76" name="select second cluster and id" width="90" x="447" y="255">
<parameter key="attribute_filter_type" value="subset"/>
<parameter key="attributes" value="cluster2||id"/>
<parameter key="include_special_attributes" value="true"/>
</operator>
<operator activated="true" class="join" compatibility="5.1.008" expanded="true" height="76" name="Join" width="90" x="581" y="165">
<list key="key_attributes"/>
</operator>
<operator activated="true" class="subprocess" compatibility="8.1.001" expanded="true" height="76" name="Calculate externalClusterValidityIndices" width="90" x="715" y="165">
<process expanded="true">
<operator activated="true" class="dummy" compatibility="8.1.001" expanded="true" height="76" name="Execute Script (2)" width="90" x="112" y="75"/>
<operator activated="true" class="subprocess" compatibility="8.1.001" expanded="true" height="76" name="Extract external performance measures" width="90" x="246" y="75">
<process expanded="true">
<operator activated="true" class="extract_performance" compatibility="8.1.001" expanded="true" height="76" name="ARI" width="90" x="45" y="30">
<parameter key="performance_type" value="data_value"/>
<parameter key="attribute_name" value="ARI"/>
<parameter key="example_index" value="1"/>
</operator>
<operator activated="true" class="extract_performance" compatibility="8.1.001" expanded="true" height="76" name="FM" width="90" x="179" y="165">
<parameter key="performance_type" value="data_value"/>
<parameter key="attribute_name" value="FM"/>
<parameter key="example_index" value="1"/>
</operator>
<operator activated="true" class="extract_performance" compatibility="8.1.001" expanded="true" height="76" name="R" width="90" x="313" y="165">
<parameter key="performance_type" value="data_value"/>
<parameter key="attribute_name" value="R"/>
<parameter key="example_index" value="1"/>
</operator>
<operator activated="true" class="extract_performance" compatibility="8.1.001" expanded="true" height="76" name="W10" width="90" x="447" y="165">
<parameter key="performance_type" value="data_value"/>
<parameter key="attribute_name" value="W10"/>
<parameter key="example_index" value="1"/>
</operator>
<operator activated="true" class="extract_performance" compatibility="8.1.001" expanded="true" height="76" name="W01" width="90" x="581" y="165">
<parameter key="performance_type" value="data_value"/>
<parameter key="attribute_name" value="W01"/>
<parameter key="example_index" value="1"/>
</operator>
<operator activated="true" class="extract_performance" compatibility="8.1.001" expanded="true" height="76" name="J" width="90" x="715" y="165">
<parameter key="performance_type" value="data_value"/>
<parameter key="attribute_name" value="J"/>
<parameter key="example_index" value="1"/>
</operator>
<connect from_port="in 1" to_op="ARI" to_port="example set"/>
<connect from_op="ARI" from_port="performance" to_port="out 1"/>
<connect from_op="ARI" from_port="example set" to_op="FM" to_port="example set"/>
<connect from_op="FM" from_port="example set" to_op="R" to_port="example set"/>
<connect from_op="R" from_port="example set" to_op="W10" to_port="example set"/>
<connect from_op="W10" from_port="example set" to_op="W01" to_port="example set"/>
<connect from_op="W01" from_port="example set" to_op="J" to_port="example set"/>
<portSpacing port="source_in 1" spacing="0"/>
<portSpacing port="source_in 2" spacing="0"/>
<portSpacing port="sink_out 1" spacing="0"/>
<portSpacing port="sink_out 2" spacing="0"/>
</process>
</operator>
<operator activated="true" class="log" compatibility="8.1.001" expanded="true" height="76" name="Log: External" width="90" x="380" y="75">
<list key="log">
<parameter key="ARI" value="operator.ARI.value.performance"/>
<parameter key="FM" value="operator.FM.value.performance"/>
<parameter key="J" value="operator.J.value.performance"/>
<parameter key="R" value="operator.R.value.performance"/>
<parameter key="W01" value="operator.W01.value.performance"/>
<parameter key="W10" value="operator.W10.value.performance"/>
<parameter key="k1" value="operator.Recall first.parameter.name"/>
<parameter key="k2" value="operator.Recall second.parameter.name"/>
</list>
</operator>
<connect from_op="Extract external performance measures" from_port="out 1" to_op="Log: External" to_port="through 1"/>
<connect from_op="Log: External" from_port="through 1" to_port="out 1"/>
<portSpacing port="source_in 1" spacing="0"/>
<portSpacing port="source_in 2" spacing="0"/>
<portSpacing port="sink_out 1" spacing="0"/>
<portSpacing port="sink_out 2" spacing="0"/>
</process>
</operator>
<connect from_op="Recall first" from_port="result" to_op="set first cluster role" to_port="example set input"/>
<connect from_op="set first cluster role" from_port="example set output" to_op="rename first cluster" to_port="example set input"/>
<connect from_op="rename first cluster" from_port="example set output" to_op="select first cluster and id" to_port="example set input"/>
<connect from_op="select first cluster and id" from_port="example set output" to_op="Join" to_port="left"/>
<connect from_op="Recall second" from_port="result" to_op="set second cluster role" to_port="example set input"/>
<connect from_op="set second cluster role" from_port="example set output" to_op="rename second cluster" to_port="example set input"/>
<connect from_op="rename second cluster" from_port="example set output" to_op="select second cluster and id" to_port="example set input"/>
<connect from_op="select second cluster and id" from_port="example set output" to_op="Join" to_port="right"/>
<connect from_op="Join" from_port="join" to_op="Calculate externalClusterValidityIndices" to_port="in 1"/>
<connect from_op="Calculate externalClusterValidityIndices" from_port="out 1" to_port="performance"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="source_input 2" spacing="0"/>
<portSpacing port="source_input 3" spacing="0"/>
<portSpacing port="sink_performance" spacing="0"/>
<portSpacing port="sink_result 1" spacing="0"/>
</process>
</operator>
<operator activated="true" class="log_to_data" compatibility="8.1.001" expanded="true" height="124" name="External validity measures" width="90" x="179" y="525">
<parameter key="log_name" value="Log: External"/>
</operator>
<operator activated="true" class="guess_types" compatibility="7.1.001" expanded="true" height="82" name="External" width="90" x="313" y="525"/>
<operator activated="true" class="loop" compatibility="8.1.001" expanded="true" height="103" name="Generate internal measures" width="90" x="45" y="660">
<parameter key="set_iteration_macro" value="true"/>
<parameter key="macro_start_value" value="2"/>
<parameter key="iterations" value="%{numberOfClusterIterations}"/>
<process expanded="true">
<operator activated="true" class="recall" compatibility="8.1.001" expanded="true" height="60" name="Recall cluster model" width="90" x="45" y="30">
<parameter key="name" value="%{iteration}_model"/>
<parameter key="io_object" value="CentroidClusterModel"/>
<parameter key="remove_from_store" value="false"/>
</operator>
<operator activated="true" class="recall" compatibility="8.1.001" expanded="true" height="60" name="Recall ith example set" width="90" x="45" y="120">
<parameter key="name" value="%{iteration}"/>
<parameter key="remove_from_store" value="false"/>
</operator>
<operator activated="true" class="provide_macro_as_log_value" compatibility="8.1.001" expanded="true" height="76" name="Log iteration macro" width="90" x="45" y="210">
<parameter key="macro_name" value="iteration"/>
</operator>
<operator activated="true" class="multiply" compatibility="8.1.001" expanded="true" height="112" name="Multiply (2)" width="90" x="179" y="30"/>
<operator activated="true" class="multiply" compatibility="8.1.001" expanded="true" height="112" name="Multiply (3)" width="90" x="179" y="210"/>
<operator activated="true" class="data_to_similarity" compatibility="8.1.001" expanded="true" height="76" name="Data to Similarity" width="90" x="179" y="345"/>
<operator activated="true" class="item_distribution_performance" compatibility="8.1.001" expanded="true" height="76" name="Distribution SoS" width="90" x="380" y="30"/>
<operator activated="true" class="item_distribution_performance" compatibility="8.1.001" expanded="true" height="76" name="Distribution Gini" width="90" x="380" y="120">
<parameter key="measure" value="GiniCoefficient"/>
</operator>
<operator activated="true" class="cluster_distance_performance" compatibility="8.1.001" expanded="true" height="94" name="Distance" width="90" x="380" y="210">
<parameter key="normalize" value="true"/>
</operator>
<operator activated="true" class="cluster_density_performance" compatibility="8.1.001" expanded="true" height="112" name="Density" width="90" x="380" y="345"/>
<operator activated="true" class="log" compatibility="8.1.001" expanded="true" height="76" name="Log: Internal" width="90" x="514" y="30">
<list key="log">
<parameter key="DaviesBouldin" value="operator.Distance.value.DaviesBouldin"/>
<parameter key="avgWithinDistance" value="operator.Distance.value.avg_within_distance"/>
<parameter key="k" value="operator.Log iteration macro.value.macro_value"/>
<parameter key="itemDistribution" value="operator.Distribution SoS.value.item_distribution"/>
<parameter key="Gini" value="operator.Distribution Gini.value.item_distribution"/>
<parameter key="clusterDensity" value="operator.Density.value.clusterdensity"/>
</list>
</operator>
<connect from_op="Recall cluster model" from_port="result" to_op="Multiply (2)" to_port="input"/>
<connect from_op="Recall ith example set" from_port="result" to_op="Log iteration macro" to_port="through 1"/>
<connect from_op="Log iteration macro" from_port="through 1" to_op="Multiply (3)" to_port="input"/>
<connect from_op="Multiply (2)" from_port="output 1" to_op="Distribution SoS" to_port="cluster model"/>
<connect from_op="Multiply (2)" from_port="output 2" to_op="Distance" to_port="cluster model"/>
<connect from_op="Multiply (2)" from_port="output 3" to_op="Density" to_port="cluster model"/>
<connect from_op="Multiply (3)" from_port="output 1" to_op="Distance" to_port="example set"/>
<connect from_op="Multiply (3)" from_port="output 2" to_op="Density" to_port="example set"/>
<connect from_op="Multiply (3)" from_port="output 3" to_op="Data to Similarity" to_port="example set"/>
<connect from_op="Data to Similarity" from_port="similarity" to_op="Density" to_port="distance measure"/>
<connect from_op="Distribution SoS" from_port="cluster model" to_op="Distribution Gini" to_port="cluster model"/>
<connect from_op="Log: Internal" from_port="through 1" to_port="output 1"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="source_input 2" spacing="0"/>
<portSpacing port="source_input 3" spacing="0"/>
<portSpacing port="sink_output 1" spacing="0"/>
<portSpacing port="sink_output 2" spacing="0"/>
</process>
</operator>
<operator activated="true" class="log_to_data" compatibility="8.1.001" expanded="true" height="103" name="Internal validity measures" width="90" x="179" y="660">
<parameter key="log_name" value="Log: Internal"/>
</operator>
<operator activated="true" class="guess_types" compatibility="7.1.001" expanded="true" height="82" name="Internal" width="90" x="313" y="660"/>
<operator activated="true" class="loop" compatibility="8.1.001" expanded="true" height="103" name="Output clusters and partitioning" width="90" x="447" y="30">
<parameter key="set_iteration_macro" value="true"/>
<parameter key="macro_start_value" value="2"/>
<parameter key="iterations" value="%{numberOfClusterIterations}"/>
<process expanded="true">
<operator activated="true" class="recall" compatibility="8.1.001" expanded="true" height="60" name="Recall: example set" width="90" x="112" y="75">
<parameter key="name" value="%{iteration}"/>
</operator>
<operator activated="true" class="recall" compatibility="8.1.001" expanded="true" height="60" name="Recall: clusters" width="90" x="112" y="165">
<parameter key="name" value="%{iteration}_model"/>
<parameter key="io_object" value="CentroidClusterModel"/>
</operator>
<connect from_op="Recall: example set" from_port="result" to_port="output 1"/>
<connect from_op="Recall: clusters" from_port="result" to_port="output 2"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="sink_output 1" spacing="0"/>
<portSpacing port="sink_output 2" spacing="0"/>
<portSpacing port="sink_output 3" spacing="0"/>
</process>
</operator>
<connect from_op="Generate data for testing" from_port="output" to_op="Rename" to_port="example set input"/>
<connect from_op="Rename" from_port="example set output" to_op="Generate clusters" to_port="input 1"/>
<connect from_op="Execute Process" from_port="result 1" to_op="Select Attributes" to_port="example set input"/>
<connect from_op="Generate clusters" from_port="output 1" to_op="Generate ground truth measures" to_port="input 1"/>
<connect from_op="Generate ground truth measures" from_port="performance" to_op="Ground truth measures" to_port="through 1"/>
<connect from_op="Generate ground truth measures" from_port="parameter" to_op="Ground truth measures" to_port="through 2"/>
<connect from_op="Ground truth measures" from_port="exampleSet" to_op="Ground" to_port="example set input"/>
<connect from_op="Ground truth measures" from_port="through 1" to_op="Generate external measures" to_port="input 1"/>
<connect from_op="Ground truth measures" from_port="through 2" to_op="Generate external measures" to_port="input 2"/>
<connect from_op="Ground" from_port="example set output" to_port="result 3"/>
<connect from_op="Generate external measures" from_port="performance" to_op="External validity measures" to_port="through 1"/>
<connect from_op="Generate external measures" from_port="parameter" to_op="External validity measures" to_port="through 2"/>
<connect from_op="External validity measures" from_port="exampleSet" to_op="External" to_port="example set input"/>
<connect from_op="External validity measures" from_port="through 1" to_op="Generate internal measures" to_port="input 1"/>
<connect from_op="External validity measures" from_port="through 2" to_op="Generate internal measures" to_port="input 2"/>
<connect from_op="External" from_port="example set output" to_port="result 4"/>
<connect from_op="Generate internal measures" from_port="output 1" to_op="Internal validity measures" to_port="through 1"/>
<connect from_op="Internal validity measures" from_port="exampleSet" to_op="Internal" to_port="example set input"/>
<connect from_op="Internal" from_port="example set output" to_port="result 5"/>
<connect from_op="Output clusters and partitioning" from_port="output 1" to_port="result 1"/>
<connect from_op="Output clusters and partitioning" from_port="output 2" to_port="result 2"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="sink_result 1" spacing="0"/>
<portSpacing port="sink_result 2" spacing="0"/>
<portSpacing port="sink_result 3" spacing="0"/>
<portSpacing port="sink_result 4" spacing="0"/>
<portSpacing port="sink_result 5" spacing="0"/>
<portSpacing port="sink_result 6" spacing="0"/>
<description align="left" color="yellow" colored="false" height="453" resized="false" width="400" x="25" y="25"><h1>Important note: </h1>
<br>
<br>
<h2>There are three files needed for this process as follows.</h2>
<ul>
<li>clusterVisualisation.xml - a RapidMiner process - this file</li>
<li>readAndProcessEcoliData.xml - a RapidMiner process called from this process</li>
<li>ecoli.data - the Ecoli data contained in a text file.</li>
</ul>
<br>
Store the processes in your repository.
<br>
<h2>Macros</h2>
In the process context, edit the following macros to ensure you run the correct process to load the ecoli data
<ul>
<li>processToRun - set this to the name of the process that reads and processes the Ecoli data - the default is readAndProcessEcoliData.</li>
<li>locationOfData - set this to the full path to the file containing the Ecoli data that you downloaded.
</ul></description>
</process>
</operator>
</process><?xml version="1.0" encoding="UTF-8"?><process version="8.1.001">
<context>
<input/>
<output/>
<macros>
<macro>
<key>fileToRead</key>
<value>file containing ecoli data</value>
</macro>
</macros>
</context>
<operator activated="true" class="process" compatibility="6.0.002" expanded="true" name="Process">
<process expanded="true">
<operator activated="true" class="read_csv" compatibility="6.0.003" expanded="true" height="68" name="Read CSV" width="90" x="112" y="30">
<parameter key="csv_file" value="%{fileToRead}"/>
<parameter key="column_separators" value="[ ]*\s"/>
<parameter key="first_row_as_names" value="false"/>
<list key="annotations"/>
<parameter key="encoding" value="windows-1252"/>
<list key="data_set_meta_data_information">
<parameter key="0" value="att1.true.polynominal.attribute"/>
<parameter key="1" value="att2.true.real.attribute"/>
<parameter key="2" value="att3.true.real.attribute"/>
<parameter key="3" value="att4.true.real.attribute"/>
<parameter key="4" value="att5.true.real.attribute"/>
<parameter key="5" value="att6.true.real.attribute"/>
<parameter key="6" value="att7.true.real.attribute"/>
<parameter key="7" value="att8.true.real.attribute"/>
<parameter key="8" value="att9.true.polynominal.attribute"/>
</list>
<parameter key="read_not_matching_values_as_missings" value="false"/>
</operator>
<operator activated="true" class="rename" compatibility="8.1.001" expanded="true" height="82" name="Rename" width="90" x="112" y="120">
<parameter key="old_name" value="att1"/>
<parameter key="new_name" value="sequenceName"/>
<list key="rename_additional_attributes">
<parameter key="att2" value="mcg"/>
<parameter key="att3" value="gvh"/>
<parameter key="att4" value="lip"/>
<parameter key="att5" value="chg"/>
<parameter key="att6" value="aac"/>
<parameter key="att7" value="alm1"/>
<parameter key="att8" value="alm2"/>
<parameter key="att9" value="site"/>
</list>
</operator>
<operator activated="true" class="set_role" compatibility="5.3.013" expanded="true" height="82" name="Set Role" width="90" x="112" y="210">
<parameter key="attribute_name" value="site"/>
<parameter key="target_role" value="label"/>
<list key="set_additional_roles">
<parameter key="sequenceName" value="id"/>
</list>
</operator>
<connect from_op="Read CSV" from_port="output" to_op="Rename" to_port="example set input"/>
<connect from_op="Rename" from_port="example set output" to_op="Set Role" to_port="example set input"/>
<connect from_op="Set Role" from_port="example set output" to_port="result 1"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="sink_result 1" spacing="0"/>
<portSpacing port="sink_result 2" spacing="0"/>
</process>
</operator>
</process>if it doesn't work, you can find in attached file the second process in .rmp file.( to continue your job)
I hope the mystery will dissipate.
Regards,
Lionel
NB : if you do not support Real Madrid, you will definitely have a bad night:smileyhappy:
1 -
Hi Lionel,
I will try your recommendations.
I am not especially a supporter of Real but I have great respect for them.
The "Fallrückzieher" of Ronaldo was total grandeur. Sorry for the strange German word... In English, it should be called "bicycle kick", I think.
Maerkli
0