🎉Community Raffle - Win $25

An exclusive raffle opportunity for active members like you! Complete your profile, answer questions and get your first accepted badge to enter the raffle.
Join and Win

Count Price Changes

gianluca_scheidUser: "gianluca_scheid"
New Altair Community Member
Updated by Jocelyn
Dear RapidMiner Community

I have a big data set with product name, retailer name and price at different points in time (time stamps).
My goal is to group the data by product and retailer name and count the number of price changes across all time stamps.

Is there an easy way to do this?

Thank you for your help

Best regards
GL

Find more posts tagged with

Sort by:
1 - 9 of 91
    i think you can just use the Aggregate operator. Group by product name and relatailer and use count on price to get the #.
    BR,
    Martin
    Hi Martin

    Thank you for your suggestion.

    I had already considered this option, however, it will only count the number of distinct prices but not the number of price changes:

    For example:

    t1: Product X, Retailer Y, 500USD
    t2: Product X, Retailer Y, 500USD
    t3: Product X, Retailer Y, 550USD
    t4: Product X, Retailer Y, 600USD
    t5: Product X, Retailer Y, 500USD
    t6: Product X, Retailer Y, 600USD

    When counting prices using the Aggregate function I'd get: Product X, Retailer Y, 3 distinct prices (500, 550, 600)
    However, the price changed 4 times (500 to 550, 550 to 600, 600 to 500, 500 to 600)

    Best regards
    Gianluca
    Ahh!
    The attached process should do the trick. It needs Operator Toolbox to run. Sadly i have no time to add additional comments right now. I hope this works for you.

    <?xml version="1.0" encoding="UTF-8"?><process version="9.1.000"><br>  <context><br>    <input/><br>    <output/><br>    <macros/><br>  </context><br>  <operator activated="true" class="process" compatibility="9.1.000" expanded="true" name="Process"><br>    <parameter key="logverbosity" value="init"/><br>    <parameter key="random_seed" value="2001"/><br>    <parameter key="send_mail" value="never"/><br>    <parameter key="notification_email" value=""/><br>    <parameter key="process_duration_for_mail" value="30"/><br>    <parameter key="encoding" value="SYSTEM"/><br>    <process expanded="true"><br>      <operator activated="true" class="operator_toolbox:create_exampleset" compatibility="1.8.000-SNAPSHOT" expanded="true" height="68" name="Create ExampleSet" width="90" x="112" y="85"><br>        <parameter key="generator_type" value="comma_separated_text"/><br>        <parameter key="number_of_examples" value="100"/><br>        <parameter key="use_stepsize" value="false"/><br>        <list key="function_descriptions"/><br>        <parameter key="add_id_attribute" value="false"/><br>        <list key="numeric_series_configuration"/><br>        <list key="date_series_configuration"/><br>        <list key="date_series_configuration (interval)"/><br>        <parameter key="date_format" value="yyyy-MM-dd HH:mm:ss"/><br>        <parameter key="input_csv_text" value="product, retailer, price&#10;Product X, Retailer Y, 500&#10;Product X, Retailer Y, 500&#10;Product X, Retailer Y, 550&#10;Product X, Retailer Y, 600&#10;Product X, Retailer Y, 500&#10;Product X, Retailer Y, 600&#10;Product Y, Retailer Y, 500&#10;Product Y, Retailer Y, 500&#10;Product Y, Retailer Y, 550&#10;Product Y, Retailer Y, 600&#10;Product Y, Retailer Y, 500&#10;Product Y, Retailer Y, 600"/><br>        <parameter key="column_separator" value=","/><br>        <parameter key="parse_all_as_nominal" value="false"/><br>        <parameter key="decimal_point_character" value="."/><br>        <parameter key="trim_attribute_names" value="true"/><br>      </operator><br>      <operator activated="true" class="generate_concatenation" compatibility="9.1.000" expanded="true" height="82" name="Generate Concatenation" width="90" x="246" y="85"><br>        <parameter key="first_attribute" value="product"/><br>        <parameter key="second_attribute" value="retailer"/><br>        <parameter key="separator" value="_"/><br>        <parameter key="trim_values" value="false"/><br>      </operator><br>      <operator activated="true" class="operator_toolbox:group_into_collection" compatibility="1.8.000-SNAPSHOT" expanded="true" height="82" name="Group Into Collection" width="90" x="447" y="85"><br>        <parameter key="group_by_attribute" value="product_retailer"/><br>        <parameter key="group_by_attribute (numerical)" value=""/><br>        <parameter key="sorting_order" value="none"/><br>      </operator><br>      <operator activated="true" class="loop_collection" compatibility="9.1.000" expanded="true" height="82" name="Loop Collection" width="90" x="581" y="85"><br>        <parameter key="set_iteration_macro" value="false"/><br>        <parameter key="macro_name" value="iteration"/><br>        <parameter key="macro_start_value" value="1"/><br>        <parameter key="unfold" value="false"/><br>        <process expanded="true"><br>          <operator activated="true" class="operator_toolbox:generate_session_id" compatibility="1.8.000-SNAPSHOT" expanded="true" height="82" name="Generate Session ID" width="90" x="112" y="34"><br>            <parameter key="date_attribute" value="price"/><br>            <parameter key="gap_threshold" value="0.0"/><br>            <parameter key="gap_unit" value="none"/><br>            <parameter key="use_absolutes" value="false"/><br>          </operator><br>          <operator activated="true" class="remove_duplicates" compatibility="9.1.000" expanded="true" height="103" name="Remove Duplicates" width="90" x="246" y="34"><br>            <parameter key="attribute_filter_type" value="single"/><br>            <parameter key="attribute" value="Session id"/><br>            <parameter key="attributes" value=""/><br>            <parameter key="use_except_expression" value="false"/><br>            <parameter key="value_type" value="attribute_value"/><br>            <parameter key="use_value_type_exception" value="false"/><br>            <parameter key="except_value_type" value="time"/><br>            <parameter key="block_type" value="attribute_block"/><br>            <parameter key="use_block_type_exception" value="false"/><br>            <parameter key="except_block_type" value="value_matrix_row_start"/><br>            <parameter key="invert_selection" value="false"/><br>            <parameter key="include_special_attributes" value="true"/><br>            <parameter key="treat_missing_values_as_duplicates" value="false"/><br>          </operator><br>          <operator activated="true" class="aggregate" compatibility="9.1.000" expanded="true" height="82" name="Aggregate" width="90" x="447" y="34"><br>            <parameter key="use_default_aggregation" value="false"/><br>            <parameter key="attribute_filter_type" value="all"/><br>            <parameter key="attribute" value=""/><br>            <parameter key="attributes" value=""/><br>            <parameter key="use_except_expression" value="false"/><br>            <parameter key="value_type" value="attribute_value"/><br>            <parameter key="use_value_type_exception" value="false"/><br>            <parameter key="except_value_type" value="time"/><br>            <parameter key="block_type" value="attribute_block"/><br>            <parameter key="use_block_type_exception" value="false"/><br>            <parameter key="except_block_type" value="value_matrix_row_start"/><br>            <parameter key="invert_selection" value="false"/><br>            <parameter key="include_special_attributes" value="false"/><br>            <parameter key="default_aggregation_function" value="average"/><br>            <list key="aggregation_attributes"><br>              <parameter key="product" value="count"/><br>            </list><br>            <parameter key="group_by_attributes" value="retailer|product"/><br>            <parameter key="count_all_combinations" value="false"/><br>            <parameter key="only_distinct" value="false"/><br>            <parameter key="ignore_missings" value="true"/><br>          </operator><br>          <connect from_port="single" to_op="Generate Session ID" to_port="exa"/><br>          <connect from_op="Generate Session ID" from_port="exa" to_op="Remove Duplicates" to_port="example set input"/><br>          <connect from_op="Remove Duplicates" from_port="example set output" to_op="Aggregate" to_port="example set input"/><br>          <connect from_op="Aggregate" from_port="example set output" to_port="output 1"/><br>          <portSpacing port="source_single" spacing="0"/><br>          <portSpacing port="sink_output 1" spacing="0"/><br>          <portSpacing port="sink_output 2" spacing="0"/><br>        </process><br>      </operator><br>      <operator activated="true" class="append" compatibility="9.1.000" expanded="true" height="82" name="Append" width="90" x="715" y="85"><br>        <parameter key="datamanagement" value="double_array"/><br>        <parameter key="data_management" value="auto"/><br>        <parameter key="merge_type" value="all"/><br>      </operator><br>      <connect from_op="Create ExampleSet" from_port="output" to_op="Generate Concatenation" to_port="example set input"/><br>      <connect from_op="Generate Concatenation" from_port="example set output" to_op="Group Into Collection" to_port="exa"/><br>      <connect from_op="Group Into Collection" from_port="col" to_op="Loop Collection" to_port="collection"/><br>      <connect from_op="Loop Collection" from_port="output 1" to_op="Append" to_port="example set 1"/><br>      <connect from_op="Append" from_port="merged set" to_port="result 1"/><br>      <portSpacing port="source_input 1" spacing="0"/><br>      <portSpacing port="sink_result 1" spacing="0"/><br>      <portSpacing port="sink_result 2" spacing="0"/><br>    </process><br>  </operator><br></process><br><br>


    Hi Martin

    Thank you for your help. Unfortunately, I don't know (yet) how to implement the xml code into my process...


    Best regards
    Gianluca
    varunm1User: "varunm1"
    New Altair Community Member
    Updated by varunm1
    @gianluca_scheid

    You just need to open new process then go to menu bar View--> Show panel --> XML. Now you will get an XML window, delete code in XML window and just copy the complete code from @mschmitz and paste it there. Once you paste it click green tick mark so that you can see the processes in the process window. hope this helps

    Thanks,
    Varun
    @varunm1 and @mschmitz: Thank you so much for your help :)
    You can also use the Lag operator in the Finance and Economics extension to calculate the price difference between rows---and any non-zero value is then a price change.  You could additionally aggregate on those if needed.
    while this is of course technically possible, i would still recommend to use my options, since it is more memory efficient.
    BR,
    Martin
    Of course, @mschmitz, as is often the case I am simply pointing out another possible way of doing the same task.  There are always so many options in RapidMiner!