Count Price Changes

gianluca_scheid
gianluca_scheid New Altair Community Member
edited November 2024 in Community Q&A
Dear RapidMiner Community

I have a big data set with product name, retailer name and price at different points in time (time stamps).
My goal is to group the data by product and retailer name and count the number of price changes across all time stamps.

Is there an easy way to do this?

Thank you for your help

Best regards
GL

Best Answer

  • MartinLiebig
    MartinLiebig
    Altair Employee
    Answer ✓
    Ahh!
    The attached process should do the trick. It needs Operator Toolbox to run. Sadly i have no time to add additional comments right now. I hope this works for you.

    <?xml version="1.0" encoding="UTF-8"?><process version="9.1.000"><br>  <context><br>    <input/><br>    <output/><br>    <macros/><br>  </context><br>  <operator activated="true" class="process" compatibility="9.1.000" expanded="true" name="Process"><br>    <parameter key="logverbosity" value="init"/><br>    <parameter key="random_seed" value="2001"/><br>    <parameter key="send_mail" value="never"/><br>    <parameter key="notification_email" value=""/><br>    <parameter key="process_duration_for_mail" value="30"/><br>    <parameter key="encoding" value="SYSTEM"/><br>    <process expanded="true"><br>      <operator activated="true" class="operator_toolbox:create_exampleset" compatibility="1.8.000-SNAPSHOT" expanded="true" height="68" name="Create ExampleSet" width="90" x="112" y="85"><br>        <parameter key="generator_type" value="comma_separated_text"/><br>        <parameter key="number_of_examples" value="100"/><br>        <parameter key="use_stepsize" value="false"/><br>        <list key="function_descriptions"/><br>        <parameter key="add_id_attribute" value="false"/><br>        <list key="numeric_series_configuration"/><br>        <list key="date_series_configuration"/><br>        <list key="date_series_configuration (interval)"/><br>        <parameter key="date_format" value="yyyy-MM-dd HH:mm:ss"/><br>        <parameter key="input_csv_text" value="product, retailer, price&#10;Product X, Retailer Y, 500&#10;Product X, Retailer Y, 500&#10;Product X, Retailer Y, 550&#10;Product X, Retailer Y, 600&#10;Product X, Retailer Y, 500&#10;Product X, Retailer Y, 600&#10;Product Y, Retailer Y, 500&#10;Product Y, Retailer Y, 500&#10;Product Y, Retailer Y, 550&#10;Product Y, Retailer Y, 600&#10;Product Y, Retailer Y, 500&#10;Product Y, Retailer Y, 600"/><br>        <parameter key="column_separator" value=","/><br>        <parameter key="parse_all_as_nominal" value="false"/><br>        <parameter key="decimal_point_character" value="."/><br>        <parameter key="trim_attribute_names" value="true"/><br>      </operator><br>      <operator activated="true" class="generate_concatenation" compatibility="9.1.000" expanded="true" height="82" name="Generate Concatenation" width="90" x="246" y="85"><br>        <parameter key="first_attribute" value="product"/><br>        <parameter key="second_attribute" value="retailer"/><br>        <parameter key="separator" value="_"/><br>        <parameter key="trim_values" value="false"/><br>      </operator><br>      <operator activated="true" class="operator_toolbox:group_into_collection" compatibility="1.8.000-SNAPSHOT" expanded="true" height="82" name="Group Into Collection" width="90" x="447" y="85"><br>        <parameter key="group_by_attribute" value="product_retailer"/><br>        <parameter key="group_by_attribute (numerical)" value=""/><br>        <parameter key="sorting_order" value="none"/><br>      </operator><br>      <operator activated="true" class="loop_collection" compatibility="9.1.000" expanded="true" height="82" name="Loop Collection" width="90" x="581" y="85"><br>        <parameter key="set_iteration_macro" value="false"/><br>        <parameter key="macro_name" value="iteration"/><br>        <parameter key="macro_start_value" value="1"/><br>        <parameter key="unfold" value="false"/><br>        <process expanded="true"><br>          <operator activated="true" class="operator_toolbox:generate_session_id" compatibility="1.8.000-SNAPSHOT" expanded="true" height="82" name="Generate Session ID" width="90" x="112" y="34"><br>            <parameter key="date_attribute" value="price"/><br>            <parameter key="gap_threshold" value="0.0"/><br>            <parameter key="gap_unit" value="none"/><br>            <parameter key="use_absolutes" value="false"/><br>          </operator><br>          <operator activated="true" class="remove_duplicates" compatibility="9.1.000" expanded="true" height="103" name="Remove Duplicates" width="90" x="246" y="34"><br>            <parameter key="attribute_filter_type" value="single"/><br>            <parameter key="attribute" value="Session id"/><br>            <parameter key="attributes" value=""/><br>            <parameter key="use_except_expression" value="false"/><br>            <parameter key="value_type" value="attribute_value"/><br>            <parameter key="use_value_type_exception" value="false"/><br>            <parameter key="except_value_type" value="time"/><br>            <parameter key="block_type" value="attribute_block"/><br>            <parameter key="use_block_type_exception" value="false"/><br>            <parameter key="except_block_type" value="value_matrix_row_start"/><br>            <parameter key="invert_selection" value="false"/><br>            <parameter key="include_special_attributes" value="true"/><br>            <parameter key="treat_missing_values_as_duplicates" value="false"/><br>          </operator><br>          <operator activated="true" class="aggregate" compatibility="9.1.000" expanded="true" height="82" name="Aggregate" width="90" x="447" y="34"><br>            <parameter key="use_default_aggregation" value="false"/><br>            <parameter key="attribute_filter_type" value="all"/><br>            <parameter key="attribute" value=""/><br>            <parameter key="attributes" value=""/><br>            <parameter key="use_except_expression" value="false"/><br>            <parameter key="value_type" value="attribute_value"/><br>            <parameter key="use_value_type_exception" value="false"/><br>            <parameter key="except_value_type" value="time"/><br>            <parameter key="block_type" value="attribute_block"/><br>            <parameter key="use_block_type_exception" value="false"/><br>            <parameter key="except_block_type" value="value_matrix_row_start"/><br>            <parameter key="invert_selection" value="false"/><br>            <parameter key="include_special_attributes" value="false"/><br>            <parameter key="default_aggregation_function" value="average"/><br>            <list key="aggregation_attributes"><br>              <parameter key="product" value="count"/><br>            </list><br>            <parameter key="group_by_attributes" value="retailer|product"/><br>            <parameter key="count_all_combinations" value="false"/><br>            <parameter key="only_distinct" value="false"/><br>            <parameter key="ignore_missings" value="true"/><br>          </operator><br>          <connect from_port="single" to_op="Generate Session ID" to_port="exa"/><br>          <connect from_op="Generate Session ID" from_port="exa" to_op="Remove Duplicates" to_port="example set input"/><br>          <connect from_op="Remove Duplicates" from_port="example set output" to_op="Aggregate" to_port="example set input"/><br>          <connect from_op="Aggregate" from_port="example set output" to_port="output 1"/><br>          <portSpacing port="source_single" spacing="0"/><br>          <portSpacing port="sink_output 1" spacing="0"/><br>          <portSpacing port="sink_output 2" spacing="0"/><br>        </process><br>      </operator><br>      <operator activated="true" class="append" compatibility="9.1.000" expanded="true" height="82" name="Append" width="90" x="715" y="85"><br>        <parameter key="datamanagement" value="double_array"/><br>        <parameter key="data_management" value="auto"/><br>        <parameter key="merge_type" value="all"/><br>      </operator><br>      <connect from_op="Create ExampleSet" from_port="output" to_op="Generate Concatenation" to_port="example set input"/><br>      <connect from_op="Generate Concatenation" from_port="example set output" to_op="Group Into Collection" to_port="exa"/><br>      <connect from_op="Group Into Collection" from_port="col" to_op="Loop Collection" to_port="collection"/><br>      <connect from_op="Loop Collection" from_port="output 1" to_op="Append" to_port="example set 1"/><br>      <connect from_op="Append" from_port="merged set" to_port="result 1"/><br>      <portSpacing port="source_input 1" spacing="0"/><br>      <portSpacing port="sink_result 1" spacing="0"/><br>      <portSpacing port="sink_result 2" spacing="0"/><br>    </process><br>  </operator><br></process><br><br>


Answers

  • MartinLiebig
    MartinLiebig
    Altair Employee
    i think you can just use the Aggregate operator. Group by product name and relatailer and use count on price to get the #.
    BR,
    Martin
  • gianluca_scheid
    gianluca_scheid New Altair Community Member
    Hi Martin

    Thank you for your suggestion.

    I had already considered this option, however, it will only count the number of distinct prices but not the number of price changes:

    For example:

    t1: Product X, Retailer Y, 500USD
    t2: Product X, Retailer Y, 500USD
    t3: Product X, Retailer Y, 550USD
    t4: Product X, Retailer Y, 600USD
    t5: Product X, Retailer Y, 500USD
    t6: Product X, Retailer Y, 600USD

    When counting prices using the Aggregate function I'd get: Product X, Retailer Y, 3 distinct prices (500, 550, 600)
    However, the price changed 4 times (500 to 550, 550 to 600, 600 to 500, 500 to 600)

    Best regards
    Gianluca
  • MartinLiebig
    MartinLiebig
    Altair Employee
    Answer ✓
    Ahh!
    The attached process should do the trick. It needs Operator Toolbox to run. Sadly i have no time to add additional comments right now. I hope this works for you.

    <?xml version="1.0" encoding="UTF-8"?><process version="9.1.000"><br>  <context><br>    <input/><br>    <output/><br>    <macros/><br>  </context><br>  <operator activated="true" class="process" compatibility="9.1.000" expanded="true" name="Process"><br>    <parameter key="logverbosity" value="init"/><br>    <parameter key="random_seed" value="2001"/><br>    <parameter key="send_mail" value="never"/><br>    <parameter key="notification_email" value=""/><br>    <parameter key="process_duration_for_mail" value="30"/><br>    <parameter key="encoding" value="SYSTEM"/><br>    <process expanded="true"><br>      <operator activated="true" class="operator_toolbox:create_exampleset" compatibility="1.8.000-SNAPSHOT" expanded="true" height="68" name="Create ExampleSet" width="90" x="112" y="85"><br>        <parameter key="generator_type" value="comma_separated_text"/><br>        <parameter key="number_of_examples" value="100"/><br>        <parameter key="use_stepsize" value="false"/><br>        <list key="function_descriptions"/><br>        <parameter key="add_id_attribute" value="false"/><br>        <list key="numeric_series_configuration"/><br>        <list key="date_series_configuration"/><br>        <list key="date_series_configuration (interval)"/><br>        <parameter key="date_format" value="yyyy-MM-dd HH:mm:ss"/><br>        <parameter key="input_csv_text" value="product, retailer, price&#10;Product X, Retailer Y, 500&#10;Product X, Retailer Y, 500&#10;Product X, Retailer Y, 550&#10;Product X, Retailer Y, 600&#10;Product X, Retailer Y, 500&#10;Product X, Retailer Y, 600&#10;Product Y, Retailer Y, 500&#10;Product Y, Retailer Y, 500&#10;Product Y, Retailer Y, 550&#10;Product Y, Retailer Y, 600&#10;Product Y, Retailer Y, 500&#10;Product Y, Retailer Y, 600"/><br>        <parameter key="column_separator" value=","/><br>        <parameter key="parse_all_as_nominal" value="false"/><br>        <parameter key="decimal_point_character" value="."/><br>        <parameter key="trim_attribute_names" value="true"/><br>      </operator><br>      <operator activated="true" class="generate_concatenation" compatibility="9.1.000" expanded="true" height="82" name="Generate Concatenation" width="90" x="246" y="85"><br>        <parameter key="first_attribute" value="product"/><br>        <parameter key="second_attribute" value="retailer"/><br>        <parameter key="separator" value="_"/><br>        <parameter key="trim_values" value="false"/><br>      </operator><br>      <operator activated="true" class="operator_toolbox:group_into_collection" compatibility="1.8.000-SNAPSHOT" expanded="true" height="82" name="Group Into Collection" width="90" x="447" y="85"><br>        <parameter key="group_by_attribute" value="product_retailer"/><br>        <parameter key="group_by_attribute (numerical)" value=""/><br>        <parameter key="sorting_order" value="none"/><br>      </operator><br>      <operator activated="true" class="loop_collection" compatibility="9.1.000" expanded="true" height="82" name="Loop Collection" width="90" x="581" y="85"><br>        <parameter key="set_iteration_macro" value="false"/><br>        <parameter key="macro_name" value="iteration"/><br>        <parameter key="macro_start_value" value="1"/><br>        <parameter key="unfold" value="false"/><br>        <process expanded="true"><br>          <operator activated="true" class="operator_toolbox:generate_session_id" compatibility="1.8.000-SNAPSHOT" expanded="true" height="82" name="Generate Session ID" width="90" x="112" y="34"><br>            <parameter key="date_attribute" value="price"/><br>            <parameter key="gap_threshold" value="0.0"/><br>            <parameter key="gap_unit" value="none"/><br>            <parameter key="use_absolutes" value="false"/><br>          </operator><br>          <operator activated="true" class="remove_duplicates" compatibility="9.1.000" expanded="true" height="103" name="Remove Duplicates" width="90" x="246" y="34"><br>            <parameter key="attribute_filter_type" value="single"/><br>            <parameter key="attribute" value="Session id"/><br>            <parameter key="attributes" value=""/><br>            <parameter key="use_except_expression" value="false"/><br>            <parameter key="value_type" value="attribute_value"/><br>            <parameter key="use_value_type_exception" value="false"/><br>            <parameter key="except_value_type" value="time"/><br>            <parameter key="block_type" value="attribute_block"/><br>            <parameter key="use_block_type_exception" value="false"/><br>            <parameter key="except_block_type" value="value_matrix_row_start"/><br>            <parameter key="invert_selection" value="false"/><br>            <parameter key="include_special_attributes" value="true"/><br>            <parameter key="treat_missing_values_as_duplicates" value="false"/><br>          </operator><br>          <operator activated="true" class="aggregate" compatibility="9.1.000" expanded="true" height="82" name="Aggregate" width="90" x="447" y="34"><br>            <parameter key="use_default_aggregation" value="false"/><br>            <parameter key="attribute_filter_type" value="all"/><br>            <parameter key="attribute" value=""/><br>            <parameter key="attributes" value=""/><br>            <parameter key="use_except_expression" value="false"/><br>            <parameter key="value_type" value="attribute_value"/><br>            <parameter key="use_value_type_exception" value="false"/><br>            <parameter key="except_value_type" value="time"/><br>            <parameter key="block_type" value="attribute_block"/><br>            <parameter key="use_block_type_exception" value="false"/><br>            <parameter key="except_block_type" value="value_matrix_row_start"/><br>            <parameter key="invert_selection" value="false"/><br>            <parameter key="include_special_attributes" value="false"/><br>            <parameter key="default_aggregation_function" value="average"/><br>            <list key="aggregation_attributes"><br>              <parameter key="product" value="count"/><br>            </list><br>            <parameter key="group_by_attributes" value="retailer|product"/><br>            <parameter key="count_all_combinations" value="false"/><br>            <parameter key="only_distinct" value="false"/><br>            <parameter key="ignore_missings" value="true"/><br>          </operator><br>          <connect from_port="single" to_op="Generate Session ID" to_port="exa"/><br>          <connect from_op="Generate Session ID" from_port="exa" to_op="Remove Duplicates" to_port="example set input"/><br>          <connect from_op="Remove Duplicates" from_port="example set output" to_op="Aggregate" to_port="example set input"/><br>          <connect from_op="Aggregate" from_port="example set output" to_port="output 1"/><br>          <portSpacing port="source_single" spacing="0"/><br>          <portSpacing port="sink_output 1" spacing="0"/><br>          <portSpacing port="sink_output 2" spacing="0"/><br>        </process><br>      </operator><br>      <operator activated="true" class="append" compatibility="9.1.000" expanded="true" height="82" name="Append" width="90" x="715" y="85"><br>        <parameter key="datamanagement" value="double_array"/><br>        <parameter key="data_management" value="auto"/><br>        <parameter key="merge_type" value="all"/><br>      </operator><br>      <connect from_op="Create ExampleSet" from_port="output" to_op="Generate Concatenation" to_port="example set input"/><br>      <connect from_op="Generate Concatenation" from_port="example set output" to_op="Group Into Collection" to_port="exa"/><br>      <connect from_op="Group Into Collection" from_port="col" to_op="Loop Collection" to_port="collection"/><br>      <connect from_op="Loop Collection" from_port="output 1" to_op="Append" to_port="example set 1"/><br>      <connect from_op="Append" from_port="merged set" to_port="result 1"/><br>      <portSpacing port="source_input 1" spacing="0"/><br>      <portSpacing port="sink_result 1" spacing="0"/><br>      <portSpacing port="sink_result 2" spacing="0"/><br>    </process><br>  </operator><br></process><br><br>


  • gianluca_scheid
    gianluca_scheid New Altair Community Member
    Hi Martin

    Thank you for your help. Unfortunately, I don't know (yet) how to implement the xml code into my process...


    Best regards
    Gianluca
  • varunm1
    varunm1 New Altair Community Member
    edited January 2019
    @gianluca_scheid

    You just need to open new process then go to menu bar View--> Show panel --> XML. Now you will get an XML window, delete code in XML window and just copy the complete code from @mschmitz and paste it there. Once you paste it click green tick mark so that you can see the processes in the process window. hope this helps

    Thanks,
    Varun
  • gianluca_scheid
    gianluca_scheid New Altair Community Member
    @varunm1 and @mschmitz: Thank you so much for your help :)
  • Telcontar120
    Telcontar120 New Altair Community Member
    You can also use the Lag operator in the Finance and Economics extension to calculate the price difference between rows---and any non-zero value is then a price change.  You could additionally aggregate on those if needed.
  • MartinLiebig
    MartinLiebig
    Altair Employee
    while this is of course technically possible, i would still recommend to use my options, since it is more memory efficient.
    BR,
    Martin
  • Telcontar120
    Telcontar120 New Altair Community Member
    Of course, @mschmitz, as is often the case I am simply pointing out another possible way of doing the same task.  There are always so many options in RapidMiner!