Count Price Changes

gianluca_scheid
New Altair Community Member
Dear RapidMiner Community
I have a big data set with product name, retailer name and price at different points in time (time stamps).
My goal is to group the data by product and retailer name and count the number of price changes across all time stamps.
Is there an easy way to do this?
Thank you for your help
Best regards
GL
I have a big data set with product name, retailer name and price at different points in time (time stamps).
My goal is to group the data by product and retailer name and count the number of price changes across all time stamps.
Is there an easy way to do this?
Thank you for your help
Best regards
GL
0
Best Answer
-
Ahh!The attached process should do the trick. It needs Operator Toolbox to run. Sadly i have no time to add additional comments right now. I hope this works for you.
<?xml version="1.0" encoding="UTF-8"?><process version="9.1.000"><br> <context><br> <input/><br> <output/><br> <macros/><br> </context><br> <operator activated="true" class="process" compatibility="9.1.000" expanded="true" name="Process"><br> <parameter key="logverbosity" value="init"/><br> <parameter key="random_seed" value="2001"/><br> <parameter key="send_mail" value="never"/><br> <parameter key="notification_email" value=""/><br> <parameter key="process_duration_for_mail" value="30"/><br> <parameter key="encoding" value="SYSTEM"/><br> <process expanded="true"><br> <operator activated="true" class="operator_toolbox:create_exampleset" compatibility="1.8.000-SNAPSHOT" expanded="true" height="68" name="Create ExampleSet" width="90" x="112" y="85"><br> <parameter key="generator_type" value="comma_separated_text"/><br> <parameter key="number_of_examples" value="100"/><br> <parameter key="use_stepsize" value="false"/><br> <list key="function_descriptions"/><br> <parameter key="add_id_attribute" value="false"/><br> <list key="numeric_series_configuration"/><br> <list key="date_series_configuration"/><br> <list key="date_series_configuration (interval)"/><br> <parameter key="date_format" value="yyyy-MM-dd HH:mm:ss"/><br> <parameter key="input_csv_text" value="product, retailer, price Product X, Retailer Y, 500 Product X, Retailer Y, 500 Product X, Retailer Y, 550 Product X, Retailer Y, 600 Product X, Retailer Y, 500 Product X, Retailer Y, 600 Product Y, Retailer Y, 500 Product Y, Retailer Y, 500 Product Y, Retailer Y, 550 Product Y, Retailer Y, 600 Product Y, Retailer Y, 500 Product Y, Retailer Y, 600"/><br> <parameter key="column_separator" value=","/><br> <parameter key="parse_all_as_nominal" value="false"/><br> <parameter key="decimal_point_character" value="."/><br> <parameter key="trim_attribute_names" value="true"/><br> </operator><br> <operator activated="true" class="generate_concatenation" compatibility="9.1.000" expanded="true" height="82" name="Generate Concatenation" width="90" x="246" y="85"><br> <parameter key="first_attribute" value="product"/><br> <parameter key="second_attribute" value="retailer"/><br> <parameter key="separator" value="_"/><br> <parameter key="trim_values" value="false"/><br> </operator><br> <operator activated="true" class="operator_toolbox:group_into_collection" compatibility="1.8.000-SNAPSHOT" expanded="true" height="82" name="Group Into Collection" width="90" x="447" y="85"><br> <parameter key="group_by_attribute" value="product_retailer"/><br> <parameter key="group_by_attribute (numerical)" value=""/><br> <parameter key="sorting_order" value="none"/><br> </operator><br> <operator activated="true" class="loop_collection" compatibility="9.1.000" expanded="true" height="82" name="Loop Collection" width="90" x="581" y="85"><br> <parameter key="set_iteration_macro" value="false"/><br> <parameter key="macro_name" value="iteration"/><br> <parameter key="macro_start_value" value="1"/><br> <parameter key="unfold" value="false"/><br> <process expanded="true"><br> <operator activated="true" class="operator_toolbox:generate_session_id" compatibility="1.8.000-SNAPSHOT" expanded="true" height="82" name="Generate Session ID" width="90" x="112" y="34"><br> <parameter key="date_attribute" value="price"/><br> <parameter key="gap_threshold" value="0.0"/><br> <parameter key="gap_unit" value="none"/><br> <parameter key="use_absolutes" value="false"/><br> </operator><br> <operator activated="true" class="remove_duplicates" compatibility="9.1.000" expanded="true" height="103" name="Remove Duplicates" width="90" x="246" y="34"><br> <parameter key="attribute_filter_type" value="single"/><br> <parameter key="attribute" value="Session id"/><br> <parameter key="attributes" value=""/><br> <parameter key="use_except_expression" value="false"/><br> <parameter key="value_type" value="attribute_value"/><br> <parameter key="use_value_type_exception" value="false"/><br> <parameter key="except_value_type" value="time"/><br> <parameter key="block_type" value="attribute_block"/><br> <parameter key="use_block_type_exception" value="false"/><br> <parameter key="except_block_type" value="value_matrix_row_start"/><br> <parameter key="invert_selection" value="false"/><br> <parameter key="include_special_attributes" value="true"/><br> <parameter key="treat_missing_values_as_duplicates" value="false"/><br> </operator><br> <operator activated="true" class="aggregate" compatibility="9.1.000" expanded="true" height="82" name="Aggregate" width="90" x="447" y="34"><br> <parameter key="use_default_aggregation" value="false"/><br> <parameter key="attribute_filter_type" value="all"/><br> <parameter key="attribute" value=""/><br> <parameter key="attributes" value=""/><br> <parameter key="use_except_expression" value="false"/><br> <parameter key="value_type" value="attribute_value"/><br> <parameter key="use_value_type_exception" value="false"/><br> <parameter key="except_value_type" value="time"/><br> <parameter key="block_type" value="attribute_block"/><br> <parameter key="use_block_type_exception" value="false"/><br> <parameter key="except_block_type" value="value_matrix_row_start"/><br> <parameter key="invert_selection" value="false"/><br> <parameter key="include_special_attributes" value="false"/><br> <parameter key="default_aggregation_function" value="average"/><br> <list key="aggregation_attributes"><br> <parameter key="product" value="count"/><br> </list><br> <parameter key="group_by_attributes" value="retailer|product"/><br> <parameter key="count_all_combinations" value="false"/><br> <parameter key="only_distinct" value="false"/><br> <parameter key="ignore_missings" value="true"/><br> </operator><br> <connect from_port="single" to_op="Generate Session ID" to_port="exa"/><br> <connect from_op="Generate Session ID" from_port="exa" to_op="Remove Duplicates" to_port="example set input"/><br> <connect from_op="Remove Duplicates" from_port="example set output" to_op="Aggregate" to_port="example set input"/><br> <connect from_op="Aggregate" from_port="example set output" to_port="output 1"/><br> <portSpacing port="source_single" spacing="0"/><br> <portSpacing port="sink_output 1" spacing="0"/><br> <portSpacing port="sink_output 2" spacing="0"/><br> </process><br> </operator><br> <operator activated="true" class="append" compatibility="9.1.000" expanded="true" height="82" name="Append" width="90" x="715" y="85"><br> <parameter key="datamanagement" value="double_array"/><br> <parameter key="data_management" value="auto"/><br> <parameter key="merge_type" value="all"/><br> </operator><br> <connect from_op="Create ExampleSet" from_port="output" to_op="Generate Concatenation" to_port="example set input"/><br> <connect from_op="Generate Concatenation" from_port="example set output" to_op="Group Into Collection" to_port="exa"/><br> <connect from_op="Group Into Collection" from_port="col" to_op="Loop Collection" to_port="collection"/><br> <connect from_op="Loop Collection" from_port="output 1" to_op="Append" to_port="example set 1"/><br> <connect from_op="Append" from_port="merged set" to_port="result 1"/><br> <portSpacing port="source_input 1" spacing="0"/><br> <portSpacing port="sink_result 1" spacing="0"/><br> <portSpacing port="sink_result 2" spacing="0"/><br> </process><br> </operator><br></process><br><br>
3
Answers
-
Hi @gianluca_scheid ,i think you can just use the Aggregate operator. Group by product name and relatailer and use count on price to get the #.BR,Martin1
-
Hi Martin
Thank you for your suggestion.
I had already considered this option, however, it will only count the number of distinct prices but not the number of price changes:
For example:
t1: Product X, Retailer Y, 500USD
t2: Product X, Retailer Y, 500USD
t3: Product X, Retailer Y, 550USD
t4: Product X, Retailer Y, 600USD
t5: Product X, Retailer Y, 500USD
t6: Product X, Retailer Y, 600USD
When counting prices using the Aggregate function I'd get: Product X, Retailer Y, 3 distinct prices (500, 550, 600)
However, the price changed 4 times (500 to 550, 550 to 600, 600 to 500, 500 to 600)
Best regards
Gianluca0 -
Ahh!The attached process should do the trick. It needs Operator Toolbox to run. Sadly i have no time to add additional comments right now. I hope this works for you.
<?xml version="1.0" encoding="UTF-8"?><process version="9.1.000"><br> <context><br> <input/><br> <output/><br> <macros/><br> </context><br> <operator activated="true" class="process" compatibility="9.1.000" expanded="true" name="Process"><br> <parameter key="logverbosity" value="init"/><br> <parameter key="random_seed" value="2001"/><br> <parameter key="send_mail" value="never"/><br> <parameter key="notification_email" value=""/><br> <parameter key="process_duration_for_mail" value="30"/><br> <parameter key="encoding" value="SYSTEM"/><br> <process expanded="true"><br> <operator activated="true" class="operator_toolbox:create_exampleset" compatibility="1.8.000-SNAPSHOT" expanded="true" height="68" name="Create ExampleSet" width="90" x="112" y="85"><br> <parameter key="generator_type" value="comma_separated_text"/><br> <parameter key="number_of_examples" value="100"/><br> <parameter key="use_stepsize" value="false"/><br> <list key="function_descriptions"/><br> <parameter key="add_id_attribute" value="false"/><br> <list key="numeric_series_configuration"/><br> <list key="date_series_configuration"/><br> <list key="date_series_configuration (interval)"/><br> <parameter key="date_format" value="yyyy-MM-dd HH:mm:ss"/><br> <parameter key="input_csv_text" value="product, retailer, price Product X, Retailer Y, 500 Product X, Retailer Y, 500 Product X, Retailer Y, 550 Product X, Retailer Y, 600 Product X, Retailer Y, 500 Product X, Retailer Y, 600 Product Y, Retailer Y, 500 Product Y, Retailer Y, 500 Product Y, Retailer Y, 550 Product Y, Retailer Y, 600 Product Y, Retailer Y, 500 Product Y, Retailer Y, 600"/><br> <parameter key="column_separator" value=","/><br> <parameter key="parse_all_as_nominal" value="false"/><br> <parameter key="decimal_point_character" value="."/><br> <parameter key="trim_attribute_names" value="true"/><br> </operator><br> <operator activated="true" class="generate_concatenation" compatibility="9.1.000" expanded="true" height="82" name="Generate Concatenation" width="90" x="246" y="85"><br> <parameter key="first_attribute" value="product"/><br> <parameter key="second_attribute" value="retailer"/><br> <parameter key="separator" value="_"/><br> <parameter key="trim_values" value="false"/><br> </operator><br> <operator activated="true" class="operator_toolbox:group_into_collection" compatibility="1.8.000-SNAPSHOT" expanded="true" height="82" name="Group Into Collection" width="90" x="447" y="85"><br> <parameter key="group_by_attribute" value="product_retailer"/><br> <parameter key="group_by_attribute (numerical)" value=""/><br> <parameter key="sorting_order" value="none"/><br> </operator><br> <operator activated="true" class="loop_collection" compatibility="9.1.000" expanded="true" height="82" name="Loop Collection" width="90" x="581" y="85"><br> <parameter key="set_iteration_macro" value="false"/><br> <parameter key="macro_name" value="iteration"/><br> <parameter key="macro_start_value" value="1"/><br> <parameter key="unfold" value="false"/><br> <process expanded="true"><br> <operator activated="true" class="operator_toolbox:generate_session_id" compatibility="1.8.000-SNAPSHOT" expanded="true" height="82" name="Generate Session ID" width="90" x="112" y="34"><br> <parameter key="date_attribute" value="price"/><br> <parameter key="gap_threshold" value="0.0"/><br> <parameter key="gap_unit" value="none"/><br> <parameter key="use_absolutes" value="false"/><br> </operator><br> <operator activated="true" class="remove_duplicates" compatibility="9.1.000" expanded="true" height="103" name="Remove Duplicates" width="90" x="246" y="34"><br> <parameter key="attribute_filter_type" value="single"/><br> <parameter key="attribute" value="Session id"/><br> <parameter key="attributes" value=""/><br> <parameter key="use_except_expression" value="false"/><br> <parameter key="value_type" value="attribute_value"/><br> <parameter key="use_value_type_exception" value="false"/><br> <parameter key="except_value_type" value="time"/><br> <parameter key="block_type" value="attribute_block"/><br> <parameter key="use_block_type_exception" value="false"/><br> <parameter key="except_block_type" value="value_matrix_row_start"/><br> <parameter key="invert_selection" value="false"/><br> <parameter key="include_special_attributes" value="true"/><br> <parameter key="treat_missing_values_as_duplicates" value="false"/><br> </operator><br> <operator activated="true" class="aggregate" compatibility="9.1.000" expanded="true" height="82" name="Aggregate" width="90" x="447" y="34"><br> <parameter key="use_default_aggregation" value="false"/><br> <parameter key="attribute_filter_type" value="all"/><br> <parameter key="attribute" value=""/><br> <parameter key="attributes" value=""/><br> <parameter key="use_except_expression" value="false"/><br> <parameter key="value_type" value="attribute_value"/><br> <parameter key="use_value_type_exception" value="false"/><br> <parameter key="except_value_type" value="time"/><br> <parameter key="block_type" value="attribute_block"/><br> <parameter key="use_block_type_exception" value="false"/><br> <parameter key="except_block_type" value="value_matrix_row_start"/><br> <parameter key="invert_selection" value="false"/><br> <parameter key="include_special_attributes" value="false"/><br> <parameter key="default_aggregation_function" value="average"/><br> <list key="aggregation_attributes"><br> <parameter key="product" value="count"/><br> </list><br> <parameter key="group_by_attributes" value="retailer|product"/><br> <parameter key="count_all_combinations" value="false"/><br> <parameter key="only_distinct" value="false"/><br> <parameter key="ignore_missings" value="true"/><br> </operator><br> <connect from_port="single" to_op="Generate Session ID" to_port="exa"/><br> <connect from_op="Generate Session ID" from_port="exa" to_op="Remove Duplicates" to_port="example set input"/><br> <connect from_op="Remove Duplicates" from_port="example set output" to_op="Aggregate" to_port="example set input"/><br> <connect from_op="Aggregate" from_port="example set output" to_port="output 1"/><br> <portSpacing port="source_single" spacing="0"/><br> <portSpacing port="sink_output 1" spacing="0"/><br> <portSpacing port="sink_output 2" spacing="0"/><br> </process><br> </operator><br> <operator activated="true" class="append" compatibility="9.1.000" expanded="true" height="82" name="Append" width="90" x="715" y="85"><br> <parameter key="datamanagement" value="double_array"/><br> <parameter key="data_management" value="auto"/><br> <parameter key="merge_type" value="all"/><br> </operator><br> <connect from_op="Create ExampleSet" from_port="output" to_op="Generate Concatenation" to_port="example set input"/><br> <connect from_op="Generate Concatenation" from_port="example set output" to_op="Group Into Collection" to_port="exa"/><br> <connect from_op="Group Into Collection" from_port="col" to_op="Loop Collection" to_port="collection"/><br> <connect from_op="Loop Collection" from_port="output 1" to_op="Append" to_port="example set 1"/><br> <connect from_op="Append" from_port="merged set" to_port="result 1"/><br> <portSpacing port="source_input 1" spacing="0"/><br> <portSpacing port="sink_result 1" spacing="0"/><br> <portSpacing port="sink_result 2" spacing="0"/><br> </process><br> </operator><br></process><br><br>
3 -
Hi Martin
Thank you for your help. Unfortunately, I don't know (yet) how to implement the xml code into my process...
Best regards
Gianluca0 -
@gianluca_scheid
You just need to open new process then go to menu bar View--> Show panel --> XML. Now you will get an XML window, delete code in XML window and just copy the complete code from @mschmitz and paste it there. Once you paste it click green tick mark so that you can see the processes in the process window. hope this helps
Thanks,
Varun4 -
-
You can also use the Lag operator in the Finance and Economics extension to calculate the price difference between rows---and any non-zero value is then a price change. You could additionally aggregate on those if needed.2
-
Hi @Telcontar120 ,while this is of course technically possible, i would still recommend to use my options, since it is more memory efficient.BR,Martin0
-
Of course, @mschmitz, as is often the case I am simply pointing out another possible way of doing the same task. There are always so many options in RapidMiner!4