Access next row data
Hello.
I'm doing my final work at university and I get some doubts.
In first place I wanna know if there's some way to access data in the next row.
In order to access previous row data I used Lag series operator but I can't find the way to do so on the next register.
My data is like this:
Discussion Userid Parent Created Modified
1 1 0 12 14
1 2 82 15 16
1 1 85 17 20
1 3 85 22 24
2 45 0 26 32
2 48 89 33 34
2 46 90 34 35
I wanna calculate, for each userid, difference between modified(i+1)-created(i).
The attribute parent=0 means that's the first message on a discussion.
With that I wanna to calculate how many time is the between a message from a userid and his response.
For the first row I wanna 1 1 0 (16-12)=4
How can I do that? Is there a way to know what row corresponds to the last message of a discussion? How can I underline the previous row of a row with parent=0?
Best Answer
-
You can generate temporary unique ids using the Generate ID operator upstream and do the joins downstream. Then you can use a Select Attributes with invert toggled on to select that ID column attribute out. I do this all the time.
2
Answers
-
If you reverse the Sort order of your dataset then you should be able to use Lag again for this.
0 -
I don't understand.....can you explain me better?
0 -
hello @bea11005 - perhaps this will help.
<?xml version="1.0" encoding="UTF-8"?><process version="7.6.001">
<context>
<input/>
<output/>
<macros/>
</context>
<operator activated="true" class="process" compatibility="7.6.001" expanded="true" name="Process">
<process expanded="true">
<operator activated="true" class="retrieve" compatibility="7.6.001" expanded="true" height="68" name="Retrieve Untitled 6" width="90" x="45" y="34">
<parameter key="repository_entry" value="//RapidMiner OneDrive/random community stuff/Untitled 6"/>
</operator>
<operator activated="true" class="numerical_to_polynominal" compatibility="7.6.001" expanded="true" height="82" name="Numerical to Polynominal" width="90" x="179" y="34">
<parameter key="attribute_filter_type" value="single"/>
<parameter key="attribute" value="Discussion"/>
<parameter key="include_special_attributes" value="true"/>
</operator>
<operator activated="true" class="concurrency:loop_values" compatibility="7.6.001" expanded="true" height="82" name="Loop Values" width="90" x="313" y="34">
<parameter key="attribute" value="Discussion"/>
<parameter key="iteration_macro" value="id"/>
<process expanded="true">
<operator activated="true" class="filter_examples" compatibility="7.6.001" expanded="true" height="103" name="Filter Examples" width="90" x="45" y="34">
<list key="filters_list">
<parameter key="filters_entry_key" value="Discussion.equals.%{id}"/>
</list>
</operator>
<operator activated="true" class="sort" compatibility="7.6.001" expanded="true" height="82" name="Sort" width="90" x="179" y="34">
<parameter key="attribute_name" value="Discussion"/>
</operator>
<operator activated="true" class="handle_exception" compatibility="7.6.001" expanded="true" height="82" name="Handle Exception" width="90" x="313" y="34">
<process expanded="true">
<operator activated="true" class="series:lag_series" compatibility="7.4.000" expanded="true" height="82" name="Lag Series" width="90" x="112" y="34">
<list key="attributes">
<parameter key="Created" value="1"/>
</list>
</operator>
<connect from_port="in 1" to_op="Lag Series" to_port="example set input"/>
<connect from_op="Lag Series" from_port="example set output" to_port="out 1"/>
<portSpacing port="source_in 1" spacing="0"/>
<portSpacing port="source_in 2" spacing="0"/>
<portSpacing port="sink_out 1" spacing="0"/>
<portSpacing port="sink_out 2" spacing="0"/>
</process>
<process expanded="true">
<connect from_port="in 1" to_port="out 1"/>
<portSpacing port="source_in 1" spacing="0"/>
<portSpacing port="source_in 2" spacing="0"/>
<portSpacing port="sink_out 1" spacing="0"/>
<portSpacing port="sink_out 2" spacing="0"/>
</process>
</operator>
<operator activated="true" class="generate_attributes" compatibility="7.6.001" expanded="true" height="82" name="Generate Attributes" width="90" x="447" y="34">
<list key="function_descriptions">
<parameter key="DIFFERENCE" value="Modified-[Created-1]"/>
</list>
</operator>
<connect from_port="input 1" to_op="Filter Examples" to_port="example set input"/>
<connect from_op="Filter Examples" from_port="example set output" to_op="Sort" to_port="example set input"/>
<connect from_op="Sort" from_port="example set output" to_op="Handle Exception" to_port="in 1"/>
<connect from_op="Handle Exception" from_port="out 1" to_op="Generate Attributes" to_port="example set input"/>
<connect from_op="Generate Attributes" from_port="example set output" to_port="output 1"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="source_input 2" spacing="0"/>
<portSpacing port="sink_output 1" spacing="0"/>
<portSpacing port="sink_output 2" spacing="0"/>
</process>
</operator>
<connect from_op="Retrieve Untitled 6" from_port="output" to_op="Numerical to Polynominal" to_port="example set input"/>
<connect from_op="Numerical to Polynominal" from_port="example set output" to_op="Loop Values" to_port="input 1"/>
<connect from_op="Loop Values" from_port="output 1" to_port="result 1"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="sink_result 1" spacing="0"/>
<portSpacing port="sink_result 2" spacing="0"/>
</process>
</operator>
</process>Basically you need to Sort each discussion first, then Lag. See my process.
Scott0 -
I can't use Loop Values......the process ends with no exit....
0 -
Telcontar120 I can reverse the order twice on modified attribute because if I do, messages change their order and my process wouldn't be correct......
0 -
Hi,
if you have unique keys (IDs) in your example set, you can create a copy of it using Multiply, sort that the way you want, generate the required attribute, and join back based on the ID.
Regards,
Balázs
1 -
I don't have unique id's......so I can't.
Other thing I wanna know is that if it's possible to split my data depending on the value of attribute discussion.
I wanna calculate difference between messages until I arrive to the last message of a discussion, where the distance will be 0 because ther'e no next message. I need this modified(i+1)-created(i) for all the messages except de last in a discussion.
I've tried Loop values but I can't get any exit of this process...... how can I do both things?
0 -
You can generate temporary unique ids using the Generate ID operator upstream and do the joins downstream. Then you can use a Select Attributes with invert toggled on to select that ID column attribute out. I do this all the time.
2 -
ooooo...... that's a good idea....I will try with the ID's generation but it seems it will work...
Now I wanna know how to split data depending on value of discussion attribute....
1 -
Hi!
Loop Values is the operator you need. Inside the loop you can access the current value with the %{loop_value} macro by default. See the attached example:
<?xml version="1.0" encoding="UTF-8"?><process version="7.6.003">
<context>
<input/>
<output/>
<macros/>
</context>
<operator activated="true" class="process" compatibility="7.6.003" expanded="true" name="Process">
<process expanded="true">
<operator activated="true" class="retrieve" compatibility="7.6.003" expanded="true" height="68" name="Retrieve Iris" width="90" x="45" y="34">
<parameter key="repository_entry" value="//Samples/data/Iris"/>
</operator>
<operator activated="true" class="concurrency:loop_values" compatibility="7.6.003" expanded="true" height="82" name="Loop Values" width="90" x="179" y="34">
<parameter key="attribute" value="label"/>
<parameter key="enable_parallel_execution" value="false"/>
<process expanded="true">
<operator activated="true" class="filter_examples" compatibility="7.6.003" expanded="true" height="103" name="Filter Examples" width="90" x="112" y="34">
<list key="filters_list">
<parameter key="filters_entry_key" value="label.equals.%{loop_value}"/>
</list>
</operator>
<connect from_port="input 1" to_op="Filter Examples" to_port="example set input"/>
<connect from_op="Filter Examples" from_port="example set output" to_port="output 1"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="source_input 2" spacing="0"/>
<portSpacing port="sink_output 1" spacing="0"/>
<portSpacing port="sink_output 2" spacing="0"/>
</process>
</operator>
<connect from_op="Retrieve Iris" from_port="output" to_op="Loop Values" to_port="input 1"/>
<connect from_op="Loop Values" from_port="output 1" to_port="result 1"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="sink_result 1" spacing="0"/>
<portSpacing port="sink_result 2" spacing="0"/>
</process>
</operator>
</process>Make sure that "Enable parallel execution" is switched off.
Also, the loop attribute needs to be nominal. You can either create a copy of your original attribute and convert that to nominal (with Numerical to Polynominal or Format Numbers) or just convert the original if you don't need it in the numeric format later.
Regards,
Balázs
1