A program to recognize and reward our most engaged community members
<?xml version="1.0" encoding="UTF-8" standalone="no"?><process version="5.0"> <context> <input/> <output/> <macros/> </context> <operator activated="true" class="process" compatibility="5.0.8" expanded="true" name="Process"> <process expanded="true" height="521" width="748"> <operator activated="true" class="subprocess" compatibility="5.0.8" expanded="true" height="76" name="2 date attributes" width="90" x="45" y="75"> <process expanded="true" height="510" width="829"> <operator activated="true" class="generate_data" compatibility="5.0.8" expanded="true" height="60" name="Generate Data" width="90" x="112" y="30"> <parameter key="number_examples" value="1000"/> <parameter key="number_of_attributes" value="2"/> <parameter key="attributes_lower_bound" value="0.0"/> <parameter key="attributes_upper_bound" value="1.35555856E11"/> </operator> <operator activated="true" class="numerical_to_date" compatibility="5.0.8" expanded="true" height="76" name="Numerical to Date" width="90" x="246" y="30"> <parameter key="attribute_name" value="att1"/> </operator> <operator activated="true" class="numerical_to_date" compatibility="5.0.8" expanded="true" height="76" name="Numerical to Date (2)" width="90" x="380" y="30"> <parameter key="attribute_name" value="att2"/> </operator> <connect from_op="Generate Data" from_port="output" to_op="Numerical to Date" to_port="example set input"/> <connect from_op="Numerical to Date" from_port="example set output" to_op="Numerical to Date (2)" to_port="example set input"/> <connect from_op="Numerical to Date (2)" from_port="example set output" to_port="out 1"/> <portSpacing port="source_in 1" spacing="0"/> <portSpacing port="sink_out 1" spacing="0"/> <portSpacing port="sink_out 2" spacing="0"/> </process> </operator> <operator activated="true" class="date_to_numerical" compatibility="5.0.8" expanded="true" height="76" name="Date to Numerical" width="90" x="179" y="75"> <parameter key="attribute_name" value="att1"/> <parameter key="millisecond_relative_to" value="epoch"/> <parameter key="keep_old_attribute" value="true"/> </operator> <operator activated="true" class="date_to_numerical" compatibility="5.0.8" expanded="true" height="76" name="Date to Numerical (2)" width="90" x="313" y="75"> <parameter key="attribute_name" value="att2"/> <parameter key="millisecond_relative_to" value="epoch"/> <parameter key="keep_old_attribute" value="true"/> </operator> <operator activated="true" class="generate_attributes" compatibility="5.0.8" expanded="true" height="76" name="Generate Attributes" width="90" x="447" y="75"> <list key="function_descriptions"> <parameter key="diff" value="att1_millisecond -att2_millisecond"/> </list> </operator> <operator activated="true" class="filter_examples" compatibility="5.0.8" expanded="true" height="76" name="Filter Examples" width="90" x="648" y="75"> <parameter key="condition_class" value="attribute_value_filter"/> <parameter key="parameter_string" value="diff <0"/> </operator> <connect from_op="2 date attributes" from_port="out 1" to_op="Date to Numerical" to_port="example set input"/> <connect from_op="Date to Numerical" from_port="example set output" to_op="Date to Numerical (2)" to_port="example set input"/> <connect from_op="Date to Numerical (2)" from_port="example set output" to_op="Generate Attributes" to_port="example set input"/> <connect from_op="Generate Attributes" from_port="example set output" to_op="Filter Examples" to_port="example set input"/> <connect from_op="Filter Examples" from_port="example set output" to_port="result 1"/> <portSpacing port="source_input 1" spacing="0"/> <portSpacing port="sink_result 1" spacing="54"/> <portSpacing port="sink_result 2" spacing="18"/> </process> </operator></process>
Hey, I also have a question regarding dates.
I have a list with user_ids and a user has multiple dates for example:
ID date12 Fri Feb 06 15:16:07 CET 200412 Fri Feb 06 15:16:07 CET 200412 Mon Feb 09 19:16:03 CET 200412 Sat Feb 14 13:16:01 CET 200419 Wed Mar 06 19:30:09 CET 200419 Fri Feb 06 19:16:03 CET 2004
What is the expression for something like:
Look for ID. Count the first date for this ID till the last date for this ID and if the sum is more than 2, delete the data for this ID. And consinder that if the same date appears more than once for the ID take it only as one day. 12 would be deleted and only 19 would stay in the example.
With the current operator "Filter Examples" I found "condition class" to get "parameter expression" but I am not sure how to get the expression.
Has anyone an idea?
Regards
MBM
Hi MBM,
i think you need to use quite some aggregation here.
First aggregate and group by userID AND Date, delete everything which has less than 2 and use set minus to delete it from the orignal data set. That should satisfy condition 2.
For the first condition: Is your data always sorted in time? In this case, you can aggregate min(date) and max(date), calculate date_diff and do the same filtering thing.
Best.
Martin
Hey mschmitz,
I assume yes, my data should be sortet in time. I read an old thread here and I first sorted by date and after that by id. Now I have a huge list grouped by id and with the dates. I think your second suggestion makes sense. If I understand correctly I need the minimum date of an id and the maximum date of an id and then use date_diff to get the days. But how can I say "Give me to a certain id the minimum date and the maximum date"? For date_diff I need those two dates.
Thanks in advance
Hey MBM,
Take aggregate and calculate min(date) and max(date) and group by id should do the job.
~Martin
works fine
thank you!