"split operator - export data not complete for further use (operators)"

joei
New Altair Community Member
Hello,
the split operator gives me only the first three columns for further use even if the operator created more. That means that in the result view I see all split columns (more than thee) but I cannot choose them in another operator (only the first three are visible).
Here is a simple table one can try it:
bla split
asdf 2345x2134
dsaf 2345x2345x345x456x356x3546
sadf 2435x2345
the split operator gives me only the first three columns for further use even if the operator created more. That means that in the result view I see all split columns (more than thee) but I cannot choose them in another operator (only the first three are visible).
Here is a simple table one can try it:
bla split
asdf 2345x2134
dsaf 2345x2345x345x456x356x3546
sadf 2435x2345
0
Answers
-
Hi,
my quick test process worked fine, I could select up to "split_6" attribute in further operators:
Can you provide your process XML which does not work?
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<process version="7.1.000-SNAPSHOT">
<context>
<input/>
<output/>
<macros/>
</context>
<operator activated="true" class="process" compatibility="7.1.000-SNAPSHOT" expanded="true" name="Process">
<process expanded="true">
<operator activated="true" class="retrieve" compatibility="7.1.000-SNAPSHOT" expanded="true" height="68" name="Retrieve 123" width="90" x="45" y="34">
<parameter key="repository_entry" value="//Local Repository/123"/>
</operator>
<operator activated="true" class="split" compatibility="7.1.000-SNAPSHOT" expanded="true" height="82" name="Split" width="90" x="179" y="34">
<parameter key="attribute_filter_type" value="single"/>
<parameter key="attribute" value="split"/>
<parameter key="split_pattern" value="x"/>
</operator>
<operator activated="true" class="filter_examples" compatibility="7.1.000-SNAPSHOT" expanded="true" height="103" name="Filter Examples" width="90" x="313" y="34">
<list key="filters_list">
<parameter key="filters_entry_key" value="split_6.contains.35"/>
</list>
</operator>
<connect from_op="Retrieve 123" from_port="output" to_op="Split" to_port="example set input"/>
<connect from_op="Split" from_port="example set output" to_op="Filter Examples" to_port="example set input"/>
<connect from_op="Filter Examples" from_port="example set output" to_port="result 1"/>
<connect from_op="Filter Examples" from_port="original" to_port="result 2"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="sink_result 1" spacing="0"/>
<portSpacing port="sink_result 2" spacing="0"/>
<portSpacing port="sink_result 3" spacing="0"/>
</process>
</operator>
</process>
Regards,
Marco0 -
of course. (my post wasn't complete. accidently created two posts...)
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<process version="5.3.013">
<context>
<input/>
<output/>
<macros/>
</context>
<operator activated="true" class="process" compatibility="5.3.013" expanded="true" name="Process">
<process expanded="true">
<operator activated="true" class="read_excel" compatibility="5.3.013" expanded="true" height="60" name="Read Excel" width="90" x="45" y="75">
<parameter key="excel_file" value="rapidminer_split_text.xlsx"/>
<parameter key="imported_cell_range" value="A1:B4"/>
<parameter key="first_row_as_names" value="false"/>
<list key="annotations">
<parameter key="0" value="Name"/>
</list>
<list key="data_set_meta_data_information">
<parameter key="0" value="bla.true.polynominal.attribute"/>
<parameter key="1" value="split.true.nominal.attribute"/>
</list>
</operator>
<operator activated="true" class="split" compatibility="5.3.013" expanded="true" height="76" name="Split" width="90" x="180" y="52">
<parameter key="attribute_filter_type" value="subset"/>
<parameter key="attributes" value="|split"/>
<parameter key="include_special_attributes" value="true"/>
<parameter key="split_pattern" value="x"/>
</operator>
<operator activated="true" class="multiply" compatibility="5.3.013" expanded="true" height="94" name="Multiply" width="90" x="315" y="30"/>
<operator activated="true" class="select_attributes" compatibility="5.3.013" expanded="true" height="76" name="Select Attributes" width="90" x="450" y="30"/>
<connect from_op="Read Excel" from_port="output" to_op="Split" to_port="example set input"/>
<connect from_op="Split" from_port="example set output" to_op="Multiply" to_port="input"/>
<connect from_op="Multiply" from_port="output 1" to_op="Select Attributes" to_port="example set input"/>
<connect from_op="Multiply" from_port="output 2" to_port="result 2"/>
<connect from_op="Select Attributes" from_port="example set output" to_port="result 1"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="sink_result 1" spacing="0"/>
<portSpacing port="sink_result 2" spacing="0"/>
<portSpacing port="sink_result 3" spacing="0"/>
</process>
</operator>
</process>0 -
Hi,
1. RapidMiner 5.3 is old. Like really old. We cannot provide help for that anymore here. Please consider using RapidMiner Studio 7.0. instead.
2. You are using the split operator after "Read Excel". The problem is that the output of Read Excel depends on actually reading the excel file at runtime. So until then, we don't know what the result will be. Therefore the split operator creates a dummy output to show an example of how it could look like.
To use actual data, load it into the repository first, then access it with a "Retrieve" operator. That way, you have full metadata available and the split operator preview will be correct.
Regards,
Marco0 -
The filter example operator also works in my example.
But I still cant see the split columns higher than 3 in the operators select attributes, rename, remove duplicates (subset).
0 -
Hi,
yes, that is expected due to the "can't know beforehand" problem. You can still manually change those parameters if you know you will end up with 6 splits for example.
But the easiest solution is to read the data into your repository, then only use the data from the repository in your process. That way you have the actual information available during construction time.
Regards,
Marco0 -
ok thank you.0
-
How does it work with the manually change? The data is to big for loading it into the repository.0
-
Hi,
your local repository sits on your file system - data cannot be to big for that
Manually depends on the parameter. For example for "Remove Duplicates", you can select 'subset', then add the name like "split_6" to the upper right textfield and press +
Regards,
Marco0