compare values in current and previous row
sebisawa
New Altair Community Member
Hi
I want to know intervals of purchases. For example, I have records like
- user01,2012/01/01
- user02,2012/01/02
- user01,2012/01/04
- user02,2012/01/06
How I can add "days since the last purchase"? like
- user01,2012/01/01,X
- user02,2012/01/02,X
- user01,2012/01/04,3days
- user02,2012/01/06,4days
I want to know intervals of purchases. For example, I have records like
- user01,2012/01/01
- user02,2012/01/02
- user01,2012/01/04
- user02,2012/01/06
How I can add "days since the last purchase"? like
- user01,2012/01/01,X
- user02,2012/01/02,X
- user01,2012/01/04,3days
- user02,2012/01/06,4days
Tagged:
0
Answers
-
Hi,
Operator "Generate Attribute" is useful. There are many ways to add new attributes.
Especially you can manipulate date attributes with "date_add", "date_diff", "date_before", etc.
Here is an example that adds 5 days to each date:
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<process version="5.3.008">
<context>
<input/>
<output/>
<macros/>
</context>
<operator activated="true" class="process" compatibility="5.3.008" expanded="true" name="Process">
<process expanded="true">
<operator activated="true" class="generate_sales_data" compatibility="5.3.008" expanded="true" height="60" name="Generate Sales Data" width="90" x="45" y="75"/>
<operator activated="true" class="select_attributes" compatibility="5.3.008" expanded="true" height="76" name="Select Attributes" width="90" x="179" y="75">
<parameter key="attribute_filter_type" value="single"/>
<parameter key="attribute" value="date"/>
</operator>
<operator activated="true" class="generate_attributes" compatibility="5.3.008" expanded="true" height="76" name="Generate Attributes (2)" width="90" x="313" y="75">
<list key="function_descriptions">
<parameter key="date02" value="date_add(date, 5, DATE_UNIT_DAY)"/>
</list>
</operator>
<connect from_op="Generate Sales Data" from_port="output" to_op="Select Attributes" to_port="example set input"/>
<connect from_op="Select Attributes" from_port="example set output" to_op="Generate Attributes (2)" to_port="example set input"/>
<connect from_op="Generate Attributes (2)" from_port="example set output" to_port="result 1"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="sink_result 1" spacing="0"/>
<portSpacing port="sink_result 2" spacing="0"/>
</process>
</operator>
</process>
0 -
Hi,
Thanks for your reply. What I expected is not how to calculate dates, but retrieve values from previous records and compare with the current ones.
Isn't it possible?0 -
Perhaps you should clarify our example. What do you mean by "X" ?0
-
Sorry for confusion.
- user01,2012/01/01,X
- user02,2012/01/02,X
- user01,2012/01/04,3days
- user02,2012/01/06,4days
What I would like to know is, how many days passed since the last purchases.
Line 1 is the first record for user01, so we can't calculate the duration. I meant "X" as null.
Line 2 is the same as Line 1.
Line 3, it is the second purchase of user 01, the diff of 2012/01/01 and 2012/01/04 is "3 days".
Line 4, it is the second purchase of user 02, the diff of 2012/01/02 and 2012/01/06 is "4 days".
Does it make sense?
0 -
Hello
Here's an example that uses the Lag operator that you could try.<?xml version="1.0" encoding="UTF-8" standalone="no"?>
regards
<process version="5.3.013">
<context>
<input/>
<output/>
<macros/>
</context>
<operator activated="true" class="process" compatibility="5.3.013" expanded="true" name="Process">
<process expanded="true">
<operator activated="true" class="subprocess" compatibility="5.3.013" expanded="true" height="76" name="Subprocess" width="90" x="112" y="75">
<process expanded="true">
<operator activated="true" class="generate_data_user_specification" compatibility="5.3.013" expanded="true" height="60" name="Generate Data by User Specification" width="90" x="111" y="30">
<list key="attribute_values">
<parameter key="user" value=""user1""/>
<parameter key="date" value=""2013-08-01""/>
</list>
<list key="set_additional_roles"/>
</operator>
<operator activated="true" class="generate_data_user_specification" compatibility="5.3.013" expanded="true" height="60" name="Generate Data by User Specification (2)" width="90" x="112" y="120">
<list key="attribute_values">
<parameter key="user" value=""user1""/>
<parameter key="date" value=""2013-08-08""/>
</list>
<list key="set_additional_roles"/>
</operator>
<operator activated="true" class="generate_data_user_specification" compatibility="5.3.013" expanded="true" height="60" name="Generate Data by User Specification (3)" width="90" x="112" y="210">
<list key="attribute_values">
<parameter key="user" value=""user2""/>
<parameter key="date" value=""2013-08-01""/>
</list>
<list key="set_additional_roles"/>
</operator>
<operator activated="true" class="generate_data_user_specification" compatibility="5.3.013" expanded="true" height="60" name="Generate Data by User Specification (4)" width="90" x="112" y="300">
<list key="attribute_values">
<parameter key="user" value=""user2""/>
<parameter key="date" value=""2013-08-06""/>
</list>
<list key="set_additional_roles"/>
</operator>
<operator activated="true" class="append" compatibility="5.3.013" expanded="true" height="130" name="Append" width="90" x="380" y="75"/>
<operator activated="true" class="nominal_to_date" compatibility="5.3.013" expanded="true" height="76" name="Nominal to Date" width="90" x="581" y="75">
<parameter key="attribute_name" value="date"/>
<parameter key="date_format" value="yyyy-MM-dd"/>
</operator>
<connect from_op="Generate Data by User Specification" from_port="output" to_op="Append" to_port="example set 1"/>
<connect from_op="Generate Data by User Specification (2)" from_port="output" to_op="Append" to_port="example set 2"/>
<connect from_op="Generate Data by User Specification (3)" from_port="output" to_op="Append" to_port="example set 3"/>
<connect from_op="Generate Data by User Specification (4)" from_port="output" to_op="Append" to_port="example set 4"/>
<connect from_op="Append" from_port="merged set" to_op="Nominal to Date" to_port="example set input"/>
<connect from_op="Nominal to Date" from_port="example set output" to_port="out 1"/>
<portSpacing port="source_in 1" spacing="0"/>
<portSpacing port="sink_out 1" spacing="0"/>
<portSpacing port="sink_out 2" spacing="0"/>
</process>
</operator>
<operator activated="true" class="loop_values" compatibility="5.3.013" expanded="true" height="76" name="Loop Values" width="90" x="112" y="165">
<parameter key="attribute" value="user"/>
<process expanded="true">
<operator activated="true" class="filter_examples" compatibility="5.3.013" expanded="true" height="76" name="Filter Examples" width="90" x="179" y="75">
<parameter key="condition_class" value="attribute_value_filter"/>
<parameter key="parameter_string" value="user=%{loop_value}"/>
</operator>
<operator activated="true" class="series:lag_series" compatibility="5.3.000" expanded="true" height="76" name="Lag Series" width="90" x="380" y="75">
<list key="attributes">
<parameter key="date" value="1"/>
</list>
</operator>
<connect from_port="example set" to_op="Filter Examples" to_port="example set input"/>
<connect from_op="Filter Examples" from_port="example set output" to_op="Lag Series" to_port="example set input"/>
<connect from_op="Lag Series" from_port="example set output" to_port="out 1"/>
<portSpacing port="source_example set" spacing="0"/>
<portSpacing port="sink_out 1" spacing="0"/>
<portSpacing port="sink_out 2" spacing="0"/>
</process>
</operator>
<operator activated="true" class="append" compatibility="5.3.013" expanded="true" height="76" name="Append (2)" width="90" x="112" y="255"/>
<operator activated="true" class="rename" compatibility="5.3.013" expanded="true" height="76" name="Rename" width="90" x="246" y="75">
<parameter key="old_name" value="date-1"/>
<parameter key="new_name" value="date1"/>
<list key="rename_additional_attributes"/>
</operator>
<operator activated="true" class="generate_attributes" compatibility="5.3.013" expanded="true" height="76" name="Generate Attributes" width="90" x="246" y="165">
<list key="function_descriptions">
<parameter key="diff" value="date_diff(date1,date)/1000/3600/24"/>
</list>
</operator>
<operator activated="true" class="select_attributes" compatibility="5.3.013" expanded="true" height="76" name="Select Attributes" width="90" x="246" y="255">
<parameter key="attribute_filter_type" value="subset"/>
<parameter key="attributes" value="|user|diff"/>
</operator>
<connect from_op="Subprocess" from_port="out 1" to_op="Loop Values" to_port="example set"/>
<connect from_op="Loop Values" from_port="out 1" to_op="Append (2)" to_port="example set 1"/>
<connect from_op="Append (2)" from_port="merged set" to_op="Rename" to_port="example set input"/>
<connect from_op="Rename" from_port="example set output" to_op="Generate Attributes" to_port="example set input"/>
<connect from_op="Generate Attributes" from_port="example set output" to_op="Select Attributes" to_port="example set input"/>
<connect from_op="Select Attributes" from_port="example set output" to_port="result 1"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="sink_result 1" spacing="0"/>
<portSpacing port="sink_result 2" spacing="0"/>
</process>
</operator>
</process>
Andrew0 -
Hi Andrew
Thanks for your post, however I can't execute it with RapidMiner 5.3.
I could not find the operator named "Lag Series". Is this why?
Aug 13, 2013 7:49:50 PM SEVERE: Process failed: The dummy operator Lag Series (replacing series:lag_series) cannot be executed.0 -
The lag operator requires time series plugin.
Maybe you want to experiment with the Script operator.
Put this inside your script operator and find out what happens:
ExampleSet es = operator.getInput(ExampleSet.class);
es.recalculateAllAttributeStatistics();
for (Attribute a : es.getAttributes()) {
double mean = es.getStatistics(a, Statistics.AVERAGE);
String name = a.getName();
for (Example example : es) {
example[name] = example[name] - mean;
}
}
double last = 0;
for (Example e : es) {
e["a"] = e["id"] + last;
last = e["id"];
}
int size = es.size()
for (int i = 1; i < size; i++) {
Example e1 = es.getExample(i-1);
Example e0 = es.getExample(i);
e0["b"] = e0["id"] + e1["id"];
}
return es;0 -
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<process version="5.3.013">
<context>
<input/>
<output/>
<macros/>
</context>
<operator activated="true" class="process" compatibility="5.3.013" expanded="true" name="Process">
<process expanded="true">
<operator activated="true" breakpoints="after" class="generate_data" compatibility="5.3.013" expanded="true" height="60" name="Generate Data" width="90" x="45" y="30">
<parameter key="target_function" value="driller oscillation timeseries"/>
<parameter key="number_of_attributes" value="2"/>
</operator>
<operator activated="true" class="generate_id" compatibility="5.3.013" expanded="true" height="76" name="Generate ID" width="90" x="180" y="30"/>
<operator activated="true" class="generate_attributes" compatibility="5.3.013" expanded="true" height="76" name="Generate Attributes" width="90" x="315" y="30">
<list key="function_descriptions">
<parameter key="a" value=""z""/>
<parameter key="b" value=""z""/>
</list>
</operator>
<operator activated="true" class="execute_script" compatibility="5.3.013" expanded="true" height="76" name="Execute Script" width="90" x="450" y="29">
<parameter key="script" value=" ExampleSet es = operator.getInput(ExampleSet.class); es.recalculateAllAttributeStatistics(); for (Attribute a : es.getAttributes()) { double mean = es.getStatistics(a, Statistics.AVERAGE); String name = a.getName(); for (Example example : es) { example[name] = example[name] - mean; } } double last = 0; for (Example e : es) { e["a"] = e["id"] + last; last = e["id"]; } int size = es.size() for (int i = 1; i < size; i++) { 	Example e1 = es.getExample(i-1); 	Example e0 = es.getExample(i); 	e0["b"] = e0["id"] + e1["id"];		 } return es;"/>
</operator>
<connect from_op="Generate Data" from_port="output" to_op="Generate ID" to_port="example set input"/>
<connect from_op="Generate ID" from_port="example set output" to_op="Generate Attributes" to_port="example set input"/>
<connect from_op="Generate Attributes" from_port="example set output" to_op="Execute Script" to_port="input 1"/>
<connect from_op="Execute Script" from_port="output 1" to_port="result 1"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="sink_result 1" spacing="0"/>
<portSpacing port="sink_result 2" spacing="0"/>
</process>
</operator>
</process>
0 -
Hi wessel
Thanks for your reply. "Execute Script" seems very helpful.
Where is the document regarding to the scripting?0 -
The time series extension is what you need for the Lag operator.0
-
Hi Andrew and Wessel
Thanks for your kindly support.
I could run the process from Andrew by downloading Series Extension from the marketplace.
However it takes long time to finish processing. I assume it is because the examples must
be filtered as much as users exists, it is not efficient. So I use this operator without "loop values"
operator.
Script one is fantastic and someday I would love to try.
Best regards0