Why does the Split operator not keep the values?
SylvainM
New Altair Community Member
Hello everyone,
As usual, I have quite a simple task to do but I haven't found an easy answer to it yet.
Let's say I have that Example Set:
And I want to split it to get that new Example Set:
What I thought first was: "Hey! You can do that! Let's do a Transpose, then a Split, and a Transpose again!" But this is what I get:
Why did the Split operator not keep the values associated to the split examples? And what can I do to keep them?
I'm sorry if my question is obvious: I'm still exploring RapidMiner
Thanks a lot and best to all,
Sylvain
As usual, I have quite a simple task to do but I haven't found an easy answer to it yet.
Let's say I have that Example Set:
Sale | Clients | Product |
A | x | CCD |
B | x ; y | CCD |
C | y | CS |
D | y | CCD |
E | x ; z | DU |
F | x ; y ; z | CS |
G | y ; z | DU |
And I want to split it to get that new Example Set:
Sale | Clients | Product |
A | x | CCD |
B | x | CCD |
B | y | CCD |
C | y | CS |
D | y | CCD |
E | x | DU |
E | z | DU |
F | x | CS |
F | y | CS |
F | z | CS |
G | y | DU |
G | z | DU |
What I thought first was: "Hey! You can do that! Let's do a Transpose, then a Split, and a Transpose again!" But this is what I get:
Sale | Clients | Product |
? | y | ? |
? | z | ? |
? | y | ? |
? | z | ? |
? | z | ? |
A | x | CCD |
B | x | CCD |
C | y | CS |
D | y | CCD |
E | x | DU |
F | x | CS |
G | y | DU |
Why did the Split operator not keep the values associated to the split examples? And what can I do to keep them?
I'm sorry if my question is obvious: I'm still exploring RapidMiner
Thanks a lot and best to all,
Sylvain
0
Best Answer
-
Hello Ingo,
Your solution works perfectly well! It is short and clear, as I like
Thank you soooo much! I'm learning a lot with you help.
Best,
Sylvain0
Answers
-
Update
Hello everyone,
After an hour of thought, I now understand why I got that result: Split gives an empty value if there is nothing to split; it doesn't copy the example.
It does not solve my problem, however, which is to copy the example when there's nothing to split... Any advice on that point?
Thanks a lot for your help
Sylvain1 -
Hi @SylvainMNice challengeThe process below should do the trick. I have used a combination of Split, De-Pivot, and Filter Examples. The other two operators are simply used to make the result look nicer / exactly like your example above...Hope this helps,
IngoP.S.: If anybody finds a shorter solution pls let me know - I only spend 5 minutes on this but I somehow feel that there is something shorter out there...<?xml version="1.0" encoding="UTF-8"?><process version="9.3.000-BETA2"><br> <context><br> <input/><br> <output/><br> <macros/><br> </context><br> <operator activated="true" class="process" compatibility="9.3.000-BETA2" expanded="true" name="Process"><br> <parameter key="logverbosity" value="init"/><br> <parameter key="random_seed" value="2001"/><br> <parameter key="send_mail" value="never"/><br> <parameter key="notification_email" value=""/><br> <parameter key="process_duration_for_mail" value="30"/><br> <parameter key="encoding" value="UTF-8"/><br> <process expanded="true"><br> <operator activated="true" class="utility:create_exampleset" compatibility="9.3.000-BETA2" expanded="true" height="68" name="Create ExampleSet" width="90" x="45" y="34"><br> <parameter key="generator_type" value="comma separated text"/><br> <parameter key="number_of_examples" value="100"/><br> <parameter key="use_stepsize" value="false"/><br> <list key="function_descriptions"/><br> <parameter key="add_id_attribute" value="false"/><br> <list key="numeric_series_configuration"/><br> <list key="date_series_configuration"/><br> <list key="date_series_configuration (interval)"/><br> <parameter key="date_format" value="yyyy-MM-dd HH:mm:ss"/><br> <parameter key="time_zone" value="SYSTEM"/><br> <parameter key="input_csv_text" value="Sale,Clients,Product A,x,CCD B,x ; y,CCD C,y,CS D,y,CCD E,x ; z,DU F,x ; y ; z,CS G,y ; z,DU "/><br> <parameter key="column_separator" value=","/><br> <parameter key="parse_all_as_nominal" value="false"/><br> <parameter key="decimal_point_character" value="."/><br> <parameter key="trim_attribute_names" value="true"/><br> </operator><br> <operator activated="true" class="split" compatibility="9.3.000-BETA2" expanded="true" height="82" name="Split" width="90" x="179" y="34"><br> <parameter key="attribute_filter_type" value="single"/><br> <parameter key="attribute" value="Clients"/><br> <parameter key="attributes" value=""/><br> <parameter key="use_except_expression" value="false"/><br> <parameter key="value_type" value="nominal"/><br> <parameter key="use_value_type_exception" value="false"/><br> <parameter key="except_value_type" value="file_path"/><br> <parameter key="block_type" value="single_value"/><br> <parameter key="use_block_type_exception" value="false"/><br> <parameter key="except_block_type" value="single_value"/><br> <parameter key="invert_selection" value="false"/><br> <parameter key="include_special_attributes" value="false"/><br> <parameter key="split_pattern" value=" ; "/><br> <parameter key="split_mode" value="unordered_split"/><br> </operator><br> <operator activated="true" class="de_pivot" compatibility="9.3.000-BETA2" expanded="true" height="82" name="De-Pivot" width="90" x="313" y="34"><br> <list key="attribute_name"><br> <parameter key="TO_REMOVE" value="Clients_.*"/><br> </list><br> <parameter key="index_attribute" value="Client"/><br> <parameter key="create_nominal_index" value="true"/><br> <parameter key="keep_missings" value="false"/><br> </operator><br> <operator activated="true" class="filter_examples" compatibility="9.3.000-BETA2" expanded="true" height="103" name="Filter Examples" width="90" x="447" y="34"><br> <parameter key="parameter_expression" value=""/><br> <parameter key="condition_class" value="custom_filters"/><br> <parameter key="invert_filter" value="false"/><br> <list key="filters_list"><br> <parameter key="filters_entry_key" value="TO_REMOVE.equals.true"/><br> </list><br> <parameter key="filters_logic_and" value="true"/><br> <parameter key="filters_check_metadata" value="true"/><br> </operator><br> <operator activated="true" class="replace" compatibility="9.3.000-BETA2" expanded="true" height="82" name="Replace" width="90" x="581" y="34"><br> <parameter key="attribute_filter_type" value="single"/><br> <parameter key="attribute" value="Client"/><br> <parameter key="attributes" value=""/><br> <parameter key="use_except_expression" value="false"/><br> <parameter key="value_type" value="nominal"/><br> <parameter key="use_value_type_exception" value="false"/><br> <parameter key="except_value_type" value="file_path"/><br> <parameter key="block_type" value="single_value"/><br> <parameter key="use_block_type_exception" value="false"/><br> <parameter key="except_block_type" value="single_value"/><br> <parameter key="invert_selection" value="false"/><br> <parameter key="include_special_attributes" value="false"/><br> <parameter key="replace_what" value="Clients_(.*)"/><br> <parameter key="replace_by" value="$1"/><br> </operator><br> <operator activated="true" class="select_attributes" compatibility="9.3.000-BETA2" expanded="true" height="82" name="Select Attributes" width="90" x="715" y="34"><br> <parameter key="attribute_filter_type" value="single"/><br> <parameter key="attribute" value="TO_REMOVE"/><br> <parameter key="attributes" value=""/><br> <parameter key="use_except_expression" value="false"/><br> <parameter key="value_type" value="attribute_value"/><br> <parameter key="use_value_type_exception" value="false"/><br> <parameter key="except_value_type" value="time"/><br> <parameter key="block_type" value="attribute_block"/><br> <parameter key="use_block_type_exception" value="false"/><br> <parameter key="except_block_type" value="value_matrix_row_start"/><br> <parameter key="invert_selection" value="true"/><br> <parameter key="include_special_attributes" value="false"/><br> </operator><br> <connect from_op="Create ExampleSet" from_port="output" to_op="Split" to_port="example set input"/><br> <connect from_op="Split" from_port="example set output" to_op="De-Pivot" to_port="example set input"/><br> <connect from_op="De-Pivot" from_port="example set output" to_op="Filter Examples" to_port="example set input"/><br> <connect from_op="Filter Examples" from_port="example set output" to_op="Replace" to_port="example set input"/><br> <connect from_op="Replace" from_port="example set output" to_op="Select Attributes" to_port="example set input"/><br> <connect from_op="Select Attributes" from_port="example set output" to_port="result 1"/><br> <portSpacing port="source_input 1" spacing="0"/><br> <portSpacing port="sink_result 1" spacing="0"/><br> <portSpacing port="sink_result 2" spacing="0"/><br> </process><br> </operator><br></process>
3 -
Hello Ingo,
Your solution works perfectly well! It is short and clear, as I like
Thank you soooo much! I'm learning a lot with you help.
Best,
Sylvain0