Access next row data

bea11005
bea11005 New Altair Community Member
edited November 5 in Community Q&A

Hello.

I'm doing my final work at university and I get some doubts.

In first place I wanna know if there's some way to access data in the next row.

In order to access previous row data I used Lag series operator but I can't find the way to do so on the next register.

 

My data is like this:

Discussion  Userid   Parent  Created   Modified

1                  1           0           12            14

1                  2           82         15            16

1                  1           85          17            20

1                  3           85         22             24

2                  45         0           26             32

2                  48         89         33             34

2                  46         90         34             35

I wanna calculate, for each userid, difference between modified(i+1)-created(i).

The attribute parent=0 means that's the first message on a discussion.

With that I wanna to calculate how many time is the between a message from a userid and his response.

For the first row I wanna 1 1 0 (16-12)=4

How can I do that? Is there a way to know what row corresponds to the last message of a discussion? How can I underline the previous row of a row with parent=0?

Tagged:

Best Answer

  • Thomas_Ott
    Thomas_Ott New Altair Community Member
    Answer ✓

    You can generate temporary unique ids using the Generate ID operator upstream and do the joins downstream. Then you can use a Select Attributes with invert toggled on to select that ID column attribute out. I do this all the time.

Answers

  • Telcontar120
    Telcontar120 New Altair Community Member

    If you reverse the Sort order of your dataset then you should be able to use Lag again for this.

  • bea11005
    bea11005 New Altair Community Member

    I don't understand.....can you explain me better?

  • sgenzer
    sgenzer
    Altair Employee

    hello @bea11005 - perhaps this will help.

     

    <?xml version="1.0" encoding="UTF-8"?><process version="7.6.001">
    <context>
    <input/>
    <output/>
    <macros/>
    </context>
    <operator activated="true" class="process" compatibility="7.6.001" expanded="true" name="Process">
    <process expanded="true">
    <operator activated="true" class="retrieve" compatibility="7.6.001" expanded="true" height="68" name="Retrieve Untitled 6" width="90" x="45" y="34">
    <parameter key="repository_entry" value="//RapidMiner OneDrive/random community stuff/Untitled 6"/>
    </operator>
    <operator activated="true" class="numerical_to_polynominal" compatibility="7.6.001" expanded="true" height="82" name="Numerical to Polynominal" width="90" x="179" y="34">
    <parameter key="attribute_filter_type" value="single"/>
    <parameter key="attribute" value="Discussion"/>
    <parameter key="include_special_attributes" value="true"/>
    </operator>
    <operator activated="true" class="concurrency:loop_values" compatibility="7.6.001" expanded="true" height="82" name="Loop Values" width="90" x="313" y="34">
    <parameter key="attribute" value="Discussion"/>
    <parameter key="iteration_macro" value="id"/>
    <process expanded="true">
    <operator activated="true" class="filter_examples" compatibility="7.6.001" expanded="true" height="103" name="Filter Examples" width="90" x="45" y="34">
    <list key="filters_list">
    <parameter key="filters_entry_key" value="Discussion.equals.%{id}"/>
    </list>
    </operator>
    <operator activated="true" class="sort" compatibility="7.6.001" expanded="true" height="82" name="Sort" width="90" x="179" y="34">
    <parameter key="attribute_name" value="Discussion"/>
    </operator>
    <operator activated="true" class="handle_exception" compatibility="7.6.001" expanded="true" height="82" name="Handle Exception" width="90" x="313" y="34">
    <process expanded="true">
    <operator activated="true" class="series:lag_series" compatibility="7.4.000" expanded="true" height="82" name="Lag Series" width="90" x="112" y="34">
    <list key="attributes">
    <parameter key="Created" value="1"/>
    </list>
    </operator>
    <connect from_port="in 1" to_op="Lag Series" to_port="example set input"/>
    <connect from_op="Lag Series" from_port="example set output" to_port="out 1"/>
    <portSpacing port="source_in 1" spacing="0"/>
    <portSpacing port="source_in 2" spacing="0"/>
    <portSpacing port="sink_out 1" spacing="0"/>
    <portSpacing port="sink_out 2" spacing="0"/>
    </process>
    <process expanded="true">
    <connect from_port="in 1" to_port="out 1"/>
    <portSpacing port="source_in 1" spacing="0"/>
    <portSpacing port="source_in 2" spacing="0"/>
    <portSpacing port="sink_out 1" spacing="0"/>
    <portSpacing port="sink_out 2" spacing="0"/>
    </process>
    </operator>
    <operator activated="true" class="generate_attributes" compatibility="7.6.001" expanded="true" height="82" name="Generate Attributes" width="90" x="447" y="34">
    <list key="function_descriptions">
    <parameter key="DIFFERENCE" value="Modified-[Created-1]"/>
    </list>
    </operator>
    <connect from_port="input 1" to_op="Filter Examples" to_port="example set input"/>
    <connect from_op="Filter Examples" from_port="example set output" to_op="Sort" to_port="example set input"/>
    <connect from_op="Sort" from_port="example set output" to_op="Handle Exception" to_port="in 1"/>
    <connect from_op="Handle Exception" from_port="out 1" to_op="Generate Attributes" to_port="example set input"/>
    <connect from_op="Generate Attributes" from_port="example set output" to_port="output 1"/>
    <portSpacing port="source_input 1" spacing="0"/>
    <portSpacing port="source_input 2" spacing="0"/>
    <portSpacing port="sink_output 1" spacing="0"/>
    <portSpacing port="sink_output 2" spacing="0"/>
    </process>
    </operator>
    <connect from_op="Retrieve Untitled 6" from_port="output" to_op="Numerical to Polynominal" to_port="example set input"/>
    <connect from_op="Numerical to Polynominal" from_port="example set output" to_op="Loop Values" to_port="input 1"/>
    <connect from_op="Loop Values" from_port="output 1" to_port="result 1"/>
    <portSpacing port="source_input 1" spacing="0"/>
    <portSpacing port="sink_result 1" spacing="0"/>
    <portSpacing port="sink_result 2" spacing="0"/>
    </process>
    </operator>
    </process>

    Basically you need to Sort each discussion first, then Lag.  See my process.


    Scott

     

  • bea11005
    bea11005 New Altair Community Member

    I can't use Loop Values......the process ends with no exit....

  • sgenzer
    sgenzer
    Altair Employee

    hello @bea11005 - I'd recommend posting your XML process here (see "Read Before Posting" on right when you reply) and attach your dataset. This way we can replicate what you're doing and help you better.

     

    Scott

     

  • bea11005
    bea11005 New Altair Community Member

    Telcontar120 I can reverse the order twice on modified attribute because if I do, messages change their order and my process wouldn't be correct......

  • BalazsBaranyRM
    BalazsBaranyRM New Altair Community Member

    Hi,

     

    if you have unique keys (IDs) in your example set, you can create a copy of it using Multiply, sort that the way you want, generate the required attribute, and join back based on the ID.

     

    Regards,

    Balázs

  • bea11005
    bea11005 New Altair Community Member

    I don't have unique id's......so I can't.

    Other thing I wanna know is that if it's possible to split my data depending on the value of attribute discussion.

    I wanna calculate difference between messages until I arrive to the last message of a discussion, where the distance will be 0 because ther'e no next message. I need this modified(i+1)-created(i) for all the messages except de last in a discussion.

    I've tried Loop values but I can't get any exit of this process...... how can I do both things?

     

  • Thomas_Ott
    Thomas_Ott New Altair Community Member
    Answer ✓

    You can generate temporary unique ids using the Generate ID operator upstream and do the joins downstream. Then you can use a Select Attributes with invert toggled on to select that ID column attribute out. I do this all the time.

  • bea11005
    bea11005 New Altair Community Member

    ooooo...... that's a good idea....I will try with the ID's generation but it seems it will work...

    Now I wanna know how to split data depending on value of discussion attribute....

  • BalazsBaranyRM
    BalazsBaranyRM New Altair Community Member

    Hi!

     

    Loop Values is the operator you need. Inside the loop you can access the current value with the %{loop_value} macro by default. See the attached example:

    <?xml version="1.0" encoding="UTF-8"?><process version="7.6.003">
    <context>
    <input/>
    <output/>
    <macros/>
    </context>
    <operator activated="true" class="process" compatibility="7.6.003" expanded="true" name="Process">
    <process expanded="true">
    <operator activated="true" class="retrieve" compatibility="7.6.003" expanded="true" height="68" name="Retrieve Iris" width="90" x="45" y="34">
    <parameter key="repository_entry" value="//Samples/data/Iris"/>
    </operator>
    <operator activated="true" class="concurrency:loop_values" compatibility="7.6.003" expanded="true" height="82" name="Loop Values" width="90" x="179" y="34">
    <parameter key="attribute" value="label"/>
    <parameter key="enable_parallel_execution" value="false"/>
    <process expanded="true">
    <operator activated="true" class="filter_examples" compatibility="7.6.003" expanded="true" height="103" name="Filter Examples" width="90" x="112" y="34">
    <list key="filters_list">
    <parameter key="filters_entry_key" value="label.equals.%{loop_value}"/>
    </list>
    </operator>
    <connect from_port="input 1" to_op="Filter Examples" to_port="example set input"/>
    <connect from_op="Filter Examples" from_port="example set output" to_port="output 1"/>
    <portSpacing port="source_input 1" spacing="0"/>
    <portSpacing port="source_input 2" spacing="0"/>
    <portSpacing port="sink_output 1" spacing="0"/>
    <portSpacing port="sink_output 2" spacing="0"/>
    </process>
    </operator>
    <connect from_op="Retrieve Iris" from_port="output" to_op="Loop Values" to_port="input 1"/>
    <connect from_op="Loop Values" from_port="output 1" to_port="result 1"/>
    <portSpacing port="source_input 1" spacing="0"/>
    <portSpacing port="sink_result 1" spacing="0"/>
    <portSpacing port="sink_result 2" spacing="0"/>
    </process>
    </operator>
    </process>

    Make sure that "Enable parallel execution" is switched off.

    Also, the loop attribute needs to be nominal. You can either create a copy of your original attribute and convert that to nominal (with Numerical to Polynominal or Format Numbers) or just convert the original if you don't need it in the numeric format later.

     

    Regards,

    Balázs