Pivot log data by dates
leviavihay
New Altair Community Member
Hi all,
I'm trying to use RM in a certain task in which I have (kind of) log data in the following (abstract) schema:
Thanks in advanced
I'm trying to use RM in a certain task in which I have (kind of) log data in the following (abstract) schema:
- ID | Date | Result
- id1: [date1: result1, date2: result2, ... ]
- id2: [date11: result11, date22: result22, ... ]
Thanks in advanced
Tagged:
0
Best Answer
-
HI @leviavihay,
I propose the following "approximative" solution with the association Generate Concatenation, Pivot, Generate Aggregation operators :
The process :<?xml version="1.0" encoding="UTF-8"?><process version="9.2.000"> <context> <input/> <output/> <macros/> </context> <operator activated="true" class="process" compatibility="9.2.000" expanded="true" name="Process"> <parameter key="logverbosity" value="init"/> <parameter key="random_seed" value="2001"/> <parameter key="send_mail" value="never"/> <parameter key="notification_email" value=""/> <parameter key="process_duration_for_mail" value="30"/> <parameter key="encoding" value="SYSTEM"/> <process expanded="true"> <operator activated="true" breakpoints="after" class="utility:create_exampleset" compatibility="9.2.000" expanded="true" height="68" name="Create ExampleSet" width="90" x="112" y="85"> <parameter key="generator_type" value="comma separated text"/> <parameter key="number_of_examples" value="100"/> <parameter key="use_stepsize" value="false"/> <list key="function_descriptions"/> <parameter key="add_id_attribute" value="false"/> <list key="numeric_series_configuration"/> <list key="date_series_configuration"/> <list key="date_series_configuration (interval)"/> <parameter key="date_format" value="yyyy-MM-dd HH:mm:ss"/> <parameter key="time_zone" value="SYSTEM"/> <parameter key="input_csv_text" value="ID,Date,Result 1,date1,result1 1,date2,result2 2,date11,result11 2,date22,result22 "/> <parameter key="column_separator" value=","/> <parameter key="parse_all_as_nominal" value="false"/> <parameter key="decimal_point_character" value="."/> <parameter key="trim_attribute_names" value="true"/> </operator> <operator activated="true" class="generate_concatenation" compatibility="9.2.000" expanded="true" height="82" name="Generate Concatenation" width="90" x="246" y="85"> <parameter key="first_attribute" value="Date"/> <parameter key="second_attribute" value="Result"/> <parameter key="separator" value=":"/> <parameter key="trim_values" value="false"/> </operator> <operator activated="true" class="blending:pivot" compatibility="9.2.000" expanded="true" height="82" name="Pivot" width="90" x="380" y="85"> <parameter key="group_by_attributes" value="ID"/> <parameter key="column_grouping_attribute" value="Date:Result"/> <list key="aggregation_attributes"> <parameter key="Date:Result" value="concatenation"/> </list> <parameter key="use_default_aggregation" value="false"/> <parameter key="default_aggregation_function" value="first"/> </operator> <operator activated="true" class="generate_aggregation" compatibility="9.2.000" expanded="true" height="82" name="Generate Aggregation" width="90" x="514" y="85"> <parameter key="attribute_name" value="vector"/> <parameter key="attribute_filter_type" value="subset"/> <parameter key="attribute" value=""/> <parameter key="attributes" value="ID"/> <parameter key="use_except_expression" value="false"/> <parameter key="value_type" value="attribute_value"/> <parameter key="use_value_type_exception" value="false"/> <parameter key="except_value_type" value="time"/> <parameter key="block_type" value="attribute_block"/> <parameter key="use_block_type_exception" value="false"/> <parameter key="except_block_type" value="value_matrix_row_start"/> <parameter key="invert_selection" value="true"/> <parameter key="include_special_attributes" value="false"/> <parameter key="aggregation_function" value="concatenation"/> <parameter key="concatenation_separator" value=","/> <parameter key="keep_all" value="true"/> <parameter key="ignore_missings" value="true"/> <parameter key="ignore_missing_attributes" value="false"/> </operator> <operator activated="true" class="select_attributes" compatibility="9.2.000" expanded="true" height="82" name="Select Attributes" width="90" x="648" y="85"> <parameter key="attribute_filter_type" value="subset"/> <parameter key="attribute" value=""/> <parameter key="attributes" value="ID|vector"/> <parameter key="use_except_expression" value="false"/> <parameter key="value_type" value="attribute_value"/> <parameter key="use_value_type_exception" value="false"/> <parameter key="except_value_type" value="time"/> <parameter key="block_type" value="attribute_block"/> <parameter key="use_block_type_exception" value="false"/> <parameter key="except_block_type" value="value_matrix_row_start"/> <parameter key="invert_selection" value="false"/> <parameter key="include_special_attributes" value="false"/> </operator> <connect from_op="Create ExampleSet" from_port="output" to_op="Generate Concatenation" to_port="example set input"/> <connect from_op="Generate Concatenation" from_port="example set output" to_op="Pivot" to_port="input"/> <connect from_op="Pivot" from_port="output" to_op="Generate Aggregation" to_port="example set input"/> <connect from_op="Generate Aggregation" from_port="example set output" to_op="Select Attributes" to_port="example set input"/> <connect from_op="Select Attributes" from_port="example set output" to_port="result 1"/> <portSpacing port="source_input 1" spacing="0"/> <portSpacing port="sink_result 1" spacing="0"/> <portSpacing port="sink_result 2" spacing="0"/> </process> </operator> </process>
Hope this helps,
Regards,
Lionel
1
Answers
-
Can you provide actual example datasets that corrspond to your input and desired output? Simply based on your description, I don't think it is feasible because the attribute either has to be generic (like "Date") or you are going to get a set of attributes for each possible value of Date. But it is hard to say for sure without a concrete example.
You should also think about whether you actually need it in the form you are describing. The original ID/Date/Result format is a pretty standard structure that should be suitable for many ML and data science needs.0 -
yes tricky ETL problem...would want to know more context.
Scott
0 -
HI @leviavihay,
I propose the following "approximative" solution with the association Generate Concatenation, Pivot, Generate Aggregation operators :
The process :<?xml version="1.0" encoding="UTF-8"?><process version="9.2.000"> <context> <input/> <output/> <macros/> </context> <operator activated="true" class="process" compatibility="9.2.000" expanded="true" name="Process"> <parameter key="logverbosity" value="init"/> <parameter key="random_seed" value="2001"/> <parameter key="send_mail" value="never"/> <parameter key="notification_email" value=""/> <parameter key="process_duration_for_mail" value="30"/> <parameter key="encoding" value="SYSTEM"/> <process expanded="true"> <operator activated="true" breakpoints="after" class="utility:create_exampleset" compatibility="9.2.000" expanded="true" height="68" name="Create ExampleSet" width="90" x="112" y="85"> <parameter key="generator_type" value="comma separated text"/> <parameter key="number_of_examples" value="100"/> <parameter key="use_stepsize" value="false"/> <list key="function_descriptions"/> <parameter key="add_id_attribute" value="false"/> <list key="numeric_series_configuration"/> <list key="date_series_configuration"/> <list key="date_series_configuration (interval)"/> <parameter key="date_format" value="yyyy-MM-dd HH:mm:ss"/> <parameter key="time_zone" value="SYSTEM"/> <parameter key="input_csv_text" value="ID,Date,Result 1,date1,result1 1,date2,result2 2,date11,result11 2,date22,result22 "/> <parameter key="column_separator" value=","/> <parameter key="parse_all_as_nominal" value="false"/> <parameter key="decimal_point_character" value="."/> <parameter key="trim_attribute_names" value="true"/> </operator> <operator activated="true" class="generate_concatenation" compatibility="9.2.000" expanded="true" height="82" name="Generate Concatenation" width="90" x="246" y="85"> <parameter key="first_attribute" value="Date"/> <parameter key="second_attribute" value="Result"/> <parameter key="separator" value=":"/> <parameter key="trim_values" value="false"/> </operator> <operator activated="true" class="blending:pivot" compatibility="9.2.000" expanded="true" height="82" name="Pivot" width="90" x="380" y="85"> <parameter key="group_by_attributes" value="ID"/> <parameter key="column_grouping_attribute" value="Date:Result"/> <list key="aggregation_attributes"> <parameter key="Date:Result" value="concatenation"/> </list> <parameter key="use_default_aggregation" value="false"/> <parameter key="default_aggregation_function" value="first"/> </operator> <operator activated="true" class="generate_aggregation" compatibility="9.2.000" expanded="true" height="82" name="Generate Aggregation" width="90" x="514" y="85"> <parameter key="attribute_name" value="vector"/> <parameter key="attribute_filter_type" value="subset"/> <parameter key="attribute" value=""/> <parameter key="attributes" value="ID"/> <parameter key="use_except_expression" value="false"/> <parameter key="value_type" value="attribute_value"/> <parameter key="use_value_type_exception" value="false"/> <parameter key="except_value_type" value="time"/> <parameter key="block_type" value="attribute_block"/> <parameter key="use_block_type_exception" value="false"/> <parameter key="except_block_type" value="value_matrix_row_start"/> <parameter key="invert_selection" value="true"/> <parameter key="include_special_attributes" value="false"/> <parameter key="aggregation_function" value="concatenation"/> <parameter key="concatenation_separator" value=","/> <parameter key="keep_all" value="true"/> <parameter key="ignore_missings" value="true"/> <parameter key="ignore_missing_attributes" value="false"/> </operator> <operator activated="true" class="select_attributes" compatibility="9.2.000" expanded="true" height="82" name="Select Attributes" width="90" x="648" y="85"> <parameter key="attribute_filter_type" value="subset"/> <parameter key="attribute" value=""/> <parameter key="attributes" value="ID|vector"/> <parameter key="use_except_expression" value="false"/> <parameter key="value_type" value="attribute_value"/> <parameter key="use_value_type_exception" value="false"/> <parameter key="except_value_type" value="time"/> <parameter key="block_type" value="attribute_block"/> <parameter key="use_block_type_exception" value="false"/> <parameter key="except_block_type" value="value_matrix_row_start"/> <parameter key="invert_selection" value="false"/> <parameter key="include_special_attributes" value="false"/> </operator> <connect from_op="Create ExampleSet" from_port="output" to_op="Generate Concatenation" to_port="example set input"/> <connect from_op="Generate Concatenation" from_port="example set output" to_op="Pivot" to_port="input"/> <connect from_op="Pivot" from_port="output" to_op="Generate Aggregation" to_port="example set input"/> <connect from_op="Generate Aggregation" from_port="example set output" to_op="Select Attributes" to_port="example set input"/> <connect from_op="Select Attributes" from_port="example set output" to_port="result 1"/> <portSpacing port="source_input 1" spacing="0"/> <portSpacing port="sink_result 1" spacing="0"/> <portSpacing port="sink_result 2" spacing="0"/> </process> </operator> </process>
Hope this helps,
Regards,
Lionel
1 -
nicely done @lionelderkrikor. I played with this for a few min and could not get the desired result. Your solution works well.1
-
thanks you @sgenzer,
I started to search a solution with a Python script....before I realized that it was feasible with RapidMiner native operator....
Regards,
Lionel1 -
Hi @lionelderkrikor
First - Thank you and sorry for the late reply. Your solution was helpful.
Second - This solution has one problem (from my task perspective) : it concatenated events based on all seen dates.
For example, this toy case:- id1, date1, result1
- id1, date2, result2
- id2, date3, result3
- id1: [date1: result1, date2: result2, ?]
- id2: [?, ?, date3: result3]
But, it's not a big problem, it's a string so I removed all "?," using regex.
Thanks!
Avihay
0 -
Hi all
I just posted a question regarding the next step of my problem. I posted it in a new thread for easier future search.
https://community.rapidminer.com/discussion/54763
0