compare values in current and previous row

sebisawa
sebisawa New Altair Community Member
edited November 5 in Community Q&A
Hi


I want to know intervals of purchases. For example, I have records like

- user01,2012/01/01
- user02,2012/01/02
- user01,2012/01/04
- user02,2012/01/06

How I can add "days since the last purchase"?  like

- user01,2012/01/01,X
- user02,2012/01/02,X
- user01,2012/01/04,3days
- user02,2012/01/06,4days


Tagged:

Answers

  • fras
    fras New Altair Community Member
    Hi,

    Operator "Generate Attribute" is useful. There are many ways to add new attributes.
    Especially you can manipulate date attributes with "date_add", "date_diff", "date_before", etc.
    Here is an example that adds 5 days to each date:

    <?xml version="1.0" encoding="UTF-8" standalone="no"?>
    <process version="5.3.008">
      <context>
        <input/>
        <output/>
        <macros/>
      </context>
      <operator activated="true" class="process" compatibility="5.3.008" expanded="true" name="Process">
        <process expanded="true">
          <operator activated="true" class="generate_sales_data" compatibility="5.3.008" expanded="true" height="60" name="Generate Sales Data" width="90" x="45" y="75"/>
          <operator activated="true" class="select_attributes" compatibility="5.3.008" expanded="true" height="76" name="Select Attributes" width="90" x="179" y="75">
            <parameter key="attribute_filter_type" value="single"/>
            <parameter key="attribute" value="date"/>
          </operator>
          <operator activated="true" class="generate_attributes" compatibility="5.3.008" expanded="true" height="76" name="Generate Attributes (2)" width="90" x="313" y="75">
            <list key="function_descriptions">
              <parameter key="date02" value="date_add(date, 5, DATE_UNIT_DAY)"/>
            </list>
          </operator>
          <connect from_op="Generate Sales Data" from_port="output" to_op="Select Attributes" to_port="example set input"/>
          <connect from_op="Select Attributes" from_port="example set output" to_op="Generate Attributes (2)" to_port="example set input"/>
          <connect from_op="Generate Attributes (2)" from_port="example set output" to_port="result 1"/>
          <portSpacing port="source_input 1" spacing="0"/>
          <portSpacing port="sink_result 1" spacing="0"/>
          <portSpacing port="sink_result 2" spacing="0"/>
        </process>
      </operator>
    </process>
  • sebisawa
    sebisawa New Altair Community Member
    Hi,


    Thanks for your reply. What I expected is not how to calculate dates, but retrieve values from previous records and compare with the current ones.
    Isn't it possible?
  • fras
    fras New Altair Community Member
    Perhaps you should clarify our example. What do you mean by "X" ?
  • sebisawa
    sebisawa New Altair Community Member
    Sorry for confusion.

    - user01,2012/01/01,X
    - user02,2012/01/02,X
    - user01,2012/01/04,3days
    - user02,2012/01/06,4days

    What I would like to know is, how many days passed since the last purchases.
    Line 1 is the first record for user01, so we can't calculate the duration. I meant "X" as null.
    Line 2 is the same as Line 1.
    Line 3, it is the second purchase of user 01, the diff of 2012/01/01 and 2012/01/04 is "3 days".
    Line 4, it is the second purchase of user 02, the diff of 2012/01/02 and 2012/01/06 is "4 days".

    Does it make sense?

  • Hello

    Here's an example that uses the Lag operator that you could try.
    <?xml version="1.0" encoding="UTF-8" standalone="no"?>
    <process version="5.3.013">
      <context>
        <input/>
        <output/>
        <macros/>
      </context>
      <operator activated="true" class="process" compatibility="5.3.013" expanded="true" name="Process">
        <process expanded="true">
          <operator activated="true" class="subprocess" compatibility="5.3.013" expanded="true" height="76" name="Subprocess" width="90" x="112" y="75">
            <process expanded="true">
              <operator activated="true" class="generate_data_user_specification" compatibility="5.3.013" expanded="true" height="60" name="Generate Data by User Specification" width="90" x="111" y="30">
                <list key="attribute_values">
                  <parameter key="user" value="&quot;user1&quot;"/>
                  <parameter key="date" value="&quot;2013-08-01&quot;"/>
                </list>
                <list key="set_additional_roles"/>
              </operator>
              <operator activated="true" class="generate_data_user_specification" compatibility="5.3.013" expanded="true" height="60" name="Generate Data by User Specification (2)" width="90" x="112" y="120">
                <list key="attribute_values">
                  <parameter key="user" value="&quot;user1&quot;"/>
                  <parameter key="date" value="&quot;2013-08-08&quot;"/>
                </list>
                <list key="set_additional_roles"/>
              </operator>
              <operator activated="true" class="generate_data_user_specification" compatibility="5.3.013" expanded="true" height="60" name="Generate Data by User Specification (3)" width="90" x="112" y="210">
                <list key="attribute_values">
                  <parameter key="user" value="&quot;user2&quot;"/>
                  <parameter key="date" value="&quot;2013-08-01&quot;"/>
                </list>
                <list key="set_additional_roles"/>
              </operator>
              <operator activated="true" class="generate_data_user_specification" compatibility="5.3.013" expanded="true" height="60" name="Generate Data by User Specification (4)" width="90" x="112" y="300">
                <list key="attribute_values">
                  <parameter key="user" value="&quot;user2&quot;"/>
                  <parameter key="date" value="&quot;2013-08-06&quot;"/>
                </list>
                <list key="set_additional_roles"/>
              </operator>
              <operator activated="true" class="append" compatibility="5.3.013" expanded="true" height="130" name="Append" width="90" x="380" y="75"/>
              <operator activated="true" class="nominal_to_date" compatibility="5.3.013" expanded="true" height="76" name="Nominal to Date" width="90" x="581" y="75">
                <parameter key="attribute_name" value="date"/>
                <parameter key="date_format" value="yyyy-MM-dd"/>
              </operator>
              <connect from_op="Generate Data by User Specification" from_port="output" to_op="Append" to_port="example set 1"/>
              <connect from_op="Generate Data by User Specification (2)" from_port="output" to_op="Append" to_port="example set 2"/>
              <connect from_op="Generate Data by User Specification (3)" from_port="output" to_op="Append" to_port="example set 3"/>
              <connect from_op="Generate Data by User Specification (4)" from_port="output" to_op="Append" to_port="example set 4"/>
              <connect from_op="Append" from_port="merged set" to_op="Nominal to Date" to_port="example set input"/>
              <connect from_op="Nominal to Date" from_port="example set output" to_port="out 1"/>
              <portSpacing port="source_in 1" spacing="0"/>
              <portSpacing port="sink_out 1" spacing="0"/>
              <portSpacing port="sink_out 2" spacing="0"/>
            </process>
          </operator>
          <operator activated="true" class="loop_values" compatibility="5.3.013" expanded="true" height="76" name="Loop Values" width="90" x="112" y="165">
            <parameter key="attribute" value="user"/>
            <process expanded="true">
              <operator activated="true" class="filter_examples" compatibility="5.3.013" expanded="true" height="76" name="Filter Examples" width="90" x="179" y="75">
                <parameter key="condition_class" value="attribute_value_filter"/>
                <parameter key="parameter_string" value="user=%{loop_value}"/>
              </operator>
              <operator activated="true" class="series:lag_series" compatibility="5.3.000" expanded="true" height="76" name="Lag Series" width="90" x="380" y="75">
                <list key="attributes">
                  <parameter key="date" value="1"/>
                </list>
              </operator>
              <connect from_port="example set" to_op="Filter Examples" to_port="example set input"/>
              <connect from_op="Filter Examples" from_port="example set output" to_op="Lag Series" to_port="example set input"/>
              <connect from_op="Lag Series" from_port="example set output" to_port="out 1"/>
              <portSpacing port="source_example set" spacing="0"/>
              <portSpacing port="sink_out 1" spacing="0"/>
              <portSpacing port="sink_out 2" spacing="0"/>
            </process>
          </operator>
          <operator activated="true" class="append" compatibility="5.3.013" expanded="true" height="76" name="Append (2)" width="90" x="112" y="255"/>
          <operator activated="true" class="rename" compatibility="5.3.013" expanded="true" height="76" name="Rename" width="90" x="246" y="75">
            <parameter key="old_name" value="date-1"/>
            <parameter key="new_name" value="date1"/>
            <list key="rename_additional_attributes"/>
          </operator>
          <operator activated="true" class="generate_attributes" compatibility="5.3.013" expanded="true" height="76" name="Generate Attributes" width="90" x="246" y="165">
            <list key="function_descriptions">
              <parameter key="diff" value="date_diff(date1,date)/1000/3600/24"/>
            </list>
          </operator>
          <operator activated="true" class="select_attributes" compatibility="5.3.013" expanded="true" height="76" name="Select Attributes" width="90" x="246" y="255">
            <parameter key="attribute_filter_type" value="subset"/>
            <parameter key="attributes" value="|user|diff"/>
          </operator>
          <connect from_op="Subprocess" from_port="out 1" to_op="Loop Values" to_port="example set"/>
          <connect from_op="Loop Values" from_port="out 1" to_op="Append (2)" to_port="example set 1"/>
          <connect from_op="Append (2)" from_port="merged set" to_op="Rename" to_port="example set input"/>
          <connect from_op="Rename" from_port="example set output" to_op="Generate Attributes" to_port="example set input"/>
          <connect from_op="Generate Attributes" from_port="example set output" to_op="Select Attributes" to_port="example set input"/>
          <connect from_op="Select Attributes" from_port="example set output" to_port="result 1"/>
          <portSpacing port="source_input 1" spacing="0"/>
          <portSpacing port="sink_result 1" spacing="0"/>
          <portSpacing port="sink_result 2" spacing="0"/>
        </process>
      </operator>
    </process>
    regards

    Andrew
  • sebisawa
    sebisawa New Altair Community Member
    Hi Andrew


    Thanks for your post, however I can't execute it with RapidMiner 5.3.
    I could not find the operator named "Lag Series". Is this why?

    Aug 13, 2013 7:49:50 PM SEVERE: Process failed: The dummy operator Lag Series (replacing series:lag_series) cannot be executed.
  • wessel
    wessel New Altair Community Member
    The lag operator requires time series plugin.

    Maybe you want to experiment with the Script operator.
    Put this inside your script operator and find out what happens:



    ExampleSet es = operator.getInput(ExampleSet.class);

    es.recalculateAllAttributeStatistics();

    for (Attribute a : es.getAttributes()) {
        double mean = es.getStatistics(a, Statistics.AVERAGE);
        String name = a.getName();
        for (Example example : es) {
            example[name] = example[name] - mean;
        }
    }

    double last = 0;
    for (Example e : es) {
        e["a"] = e["id"] + last;
        last = e["id"];
    }

    int size = es.size()
    for (int i = 1; i < size; i++) {
    Example e1 = es.getExample(i-1);
    Example e0 = es.getExample(i);
    e0["b"] = e0["id"] + e1["id"];
    }


    return es;
  • wessel
    wessel New Altair Community Member
    <?xml version="1.0" encoding="UTF-8" standalone="no"?>
    <process version="5.3.013">
      <context>
        <input/>
        <output/>
        <macros/>
      </context>
      <operator activated="true" class="process" compatibility="5.3.013" expanded="true" name="Process">
        <process expanded="true">
          <operator activated="true" breakpoints="after" class="generate_data" compatibility="5.3.013" expanded="true" height="60" name="Generate Data" width="90" x="45" y="30">
            <parameter key="target_function" value="driller oscillation timeseries"/>
            <parameter key="number_of_attributes" value="2"/>
          </operator>
          <operator activated="true" class="generate_id" compatibility="5.3.013" expanded="true" height="76" name="Generate ID" width="90" x="180" y="30"/>
          <operator activated="true" class="generate_attributes" compatibility="5.3.013" expanded="true" height="76" name="Generate Attributes" width="90" x="315" y="30">
            <list key="function_descriptions">
              <parameter key="a" value="&quot;z&quot;"/>
              <parameter key="b" value="&quot;z&quot;"/>
            </list>
          </operator>
          <operator activated="true" class="execute_script" compatibility="5.3.013" expanded="true" height="76" name="Execute Script" width="90" x="450" y="29">
            <parameter key="script" value="&#10;ExampleSet es = operator.getInput(ExampleSet.class);&#10;&#10;es.recalculateAllAttributeStatistics();&#10;&#10;for (Attribute a : es.getAttributes()) {&#10;    double mean = es.getStatistics(a, Statistics.AVERAGE);&#10;    String name = a.getName();&#10;    for (Example example : es) {&#10;        example[name] = example[name] - mean;&#10;    }&#10;}&#10;&#10;double last = 0;&#10;for (Example e : es) {&#10;    e[&quot;a&quot;] = e[&quot;id&quot;] + last;&#10;    last = e[&quot;id&quot;];&#10;}&#10;&#10;int size = es.size()&#10;for (int i = 1; i &lt; size; i++) {&#10;&#9;Example e1 = es.getExample(i-1);&#10;&#9;Example e0 = es.getExample(i);&#10;&#9;e0[&quot;b&quot;] = e0[&quot;id&quot;] + e1[&quot;id&quot;];&#9;&#9;&#10;}&#10;&#10;&#10;return es;"/>
          </operator>
          <connect from_op="Generate Data" from_port="output" to_op="Generate ID" to_port="example set input"/>
          <connect from_op="Generate ID" from_port="example set output" to_op="Generate Attributes" to_port="example set input"/>
          <connect from_op="Generate Attributes" from_port="example set output" to_op="Execute Script" to_port="input 1"/>
          <connect from_op="Execute Script" from_port="output 1" to_port="result 1"/>
          <portSpacing port="source_input 1" spacing="0"/>
          <portSpacing port="sink_result 1" spacing="0"/>
          <portSpacing port="sink_result 2" spacing="0"/>
        </process>
      </operator>
    </process>
  • sebisawa
    sebisawa New Altair Community Member
    Hi wessel

    Thanks for your reply. "Execute Script" seems very helpful.
    Where is the document regarding to the scripting?
  • The time series extension is what you need for the Lag operator. 
  • sebisawa
    sebisawa New Altair Community Member
    Hi Andrew and Wessel

    Thanks for your kindly support.

    I could run the process from Andrew by downloading Series Extension from the marketplace.
    However it takes long time to finish processing. I assume it is because the examples must
    be filtered as much as users exists, it is not efficient. So I use this operator without "loop values"
    operator.

    Script one is fantastic and someday I would love to try.

    Best regards