Difference of two dates?

I can't seem to be able to do this very easily (but I suspect this should be an easy operation). Filter Examples doesn't seem to accept dates. When I convert my two attributes to integers, then Filter Examples complains that the left hand side attribute is not numerical (it is!) when I try to use the attribute_value_filter with "date1_day > date2_day". So I searched for something else and wanted to try the Generate Aggregates function so I can create a new attribute that's either larger than zero or not, but the function does only sums and such, whereas I would need to subtract one from the other number.
As I think I'm beginning to overcomplicate the solution I would appreciate if someone could help me out with some hints.
Thanks
Answers
-
Hi SquirellX,
yes you are right, it suppose to be an easy operation, unfortunatley it isn't until the next RM release (about mid december).
But you've been on a very good track, so close to the solution:<?xml version="1.0" encoding="UTF-8" standalone="no"?>
In the next RM release the value filter can compare also date(att1) < date(att2) or similar operations.
<process version="5.0">
<context>
<input/>
<output/>
<macros/>
</context>
<operator activated="true" class="process" compatibility="5.0.8" expanded="true" name="Process">
<process expanded="true" height="521" width="748">
<operator activated="true" class="subprocess" compatibility="5.0.8" expanded="true" height="76" name="2 date attributes" width="90" x="45" y="75">
<process expanded="true" height="510" width="829">
<operator activated="true" class="generate_data" compatibility="5.0.8" expanded="true" height="60" name="Generate Data" width="90" x="112" y="30">
<parameter key="number_examples" value="1000"/>
<parameter key="number_of_attributes" value="2"/>
<parameter key="attributes_lower_bound" value="0.0"/>
<parameter key="attributes_upper_bound" value="1.35555856E11"/>
</operator>
<operator activated="true" class="numerical_to_date" compatibility="5.0.8" expanded="true" height="76" name="Numerical to Date" width="90" x="246" y="30">
<parameter key="attribute_name" value="att1"/>
</operator>
<operator activated="true" class="numerical_to_date" compatibility="5.0.8" expanded="true" height="76" name="Numerical to Date (2)" width="90" x="380" y="30">
<parameter key="attribute_name" value="att2"/>
</operator>
<connect from_op="Generate Data" from_port="output" to_op="Numerical to Date" to_port="example set input"/>
<connect from_op="Numerical to Date" from_port="example set output" to_op="Numerical to Date (2)" to_port="example set input"/>
<connect from_op="Numerical to Date (2)" from_port="example set output" to_port="out 1"/>
<portSpacing port="source_in 1" spacing="0"/>
<portSpacing port="sink_out 1" spacing="0"/>
<portSpacing port="sink_out 2" spacing="0"/>
</process>
</operator>
<operator activated="true" class="date_to_numerical" compatibility="5.0.8" expanded="true" height="76" name="Date to Numerical" width="90" x="179" y="75">
<parameter key="attribute_name" value="att1"/>
<parameter key="millisecond_relative_to" value="epoch"/>
<parameter key="keep_old_attribute" value="true"/>
</operator>
<operator activated="true" class="date_to_numerical" compatibility="5.0.8" expanded="true" height="76" name="Date to Numerical (2)" width="90" x="313" y="75">
<parameter key="attribute_name" value="att2"/>
<parameter key="millisecond_relative_to" value="epoch"/>
<parameter key="keep_old_attribute" value="true"/>
</operator>
<operator activated="true" class="generate_attributes" compatibility="5.0.8" expanded="true" height="76" name="Generate Attributes" width="90" x="447" y="75">
<list key="function_descriptions">
<parameter key="diff" value="att1_millisecond -att2_millisecond"/>
</list>
</operator>
<operator activated="true" class="filter_examples" compatibility="5.0.8" expanded="true" height="76" name="Filter Examples" width="90" x="648" y="75">
<parameter key="condition_class" value="attribute_value_filter"/>
<parameter key="parameter_string" value="diff <0"/>
</operator>
<connect from_op="2 date attributes" from_port="out 1" to_op="Date to Numerical" to_port="example set input"/>
<connect from_op="Date to Numerical" from_port="example set output" to_op="Date to Numerical (2)" to_port="example set input"/>
<connect from_op="Date to Numerical (2)" from_port="example set output" to_op="Generate Attributes" to_port="example set input"/>
<connect from_op="Generate Attributes" from_port="example set output" to_op="Filter Examples" to_port="example set input"/>
<connect from_op="Filter Examples" from_port="example set output" to_port="result 1"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="sink_result 1" spacing="54"/>
<portSpacing port="sink_result 2" spacing="18"/>
</process>
</operator>
</process>
I hope I could help,
Seabstian0 -
Thanks Sebastian, it works. Though I'm looking forward to the next release0
-
Hey, I also have a question regarding dates.
I have a list with user_ids and a user has multiple dates for example:
ID date
12 Fri Feb 06 15:16:07 CET 2004
12 Fri Feb 06 15:16:07 CET 2004
12 Mon Feb 09 19:16:03 CET 2004
12 Sat Feb 14 13:16:01 CET 2004
19 Wed Mar 06 19:30:09 CET 2004
19 Fri Feb 06 19:16:03 CET 2004What is the expression for something like:
Look for ID. Count the first date for this ID till the last date for this ID and if the sum is more than 2, delete the data for this ID. And consinder that if the same date appears more than once for the ID take it only as one day. 12 would be deleted and only 19 would stay in the example.
With the current operator "Filter Examples" I found "condition class" to get "parameter expression" but I am not sure how to get the expression.
Has anyone an idea?
Regards
MBM
0 -
Hi MBM,
i think you need to use quite some aggregation here.
First aggregate and group by userID AND Date, delete everything which has less than 2 and use set minus to delete it from the orignal data set. That should satisfy condition 2.
For the first condition: Is your data always sorted in time? In this case, you can aggregate min(date) and max(date), calculate date_diff and do the same filtering thing.
Best.
Martin
0 -
Hey mschmitz,
I assume yes, my data should be sortet in time. I read an old thread here and I first sorted by date and after that by id. Now I have a huge list grouped by id and with the dates. I think your second suggestion makes sense. If I understand correctly I need the minimum date of an id and the maximum date of an id and then use date_diff to get the days. But how can I say "Give me to a certain id the minimum date and the maximum date"? For date_diff I need those two dates.
Thanks in advance
MBM
0 -
Hey MBM,
Take aggregate and calculate min(date) and max(date) and group by id should do the job.
~Martin
0 -
works fine
thank you!
0