Data transformation

SagioProject
SagioProject New Altair Community Member
edited November 2024 in Community Q&A
Hi everybody,

For a university project I need to transform a dataset, but I really don't know how to do it in RapidMiner. Could you help me out?

The dataset contains event logs captured from a website. Attributes are timestamp, ip_address, browser_info and some other less important ones.

I generated a new attribute Date, which is only the date, without time.

Then I generated a new attribute Session_ID by concatenating the Date, ip_address and browser_info attributes.

The examples with the same Session_ID are events that occurred on the same day, by the same ip address and by the same browser.

What I now want to do is to split up these sessions. If there is a gap between 2 successive events of 30 minutes or more, I want them to be splitted in 2 different groups. I want to do that by generating a new attribute Session_in_day, which can be 1, 2, 3, ... according to the "smaller session" this example is in.

In MatLab I was more or less able to write a program to do this, but I have no clue how to do this in RapidMiner. Anyone?
Tagged:

Answers

  • MartinLiebig
    MartinLiebig
    Altair Employee
    Hi

    you can create a new attribute, which indicates whether there was a pause or not.
    To do so i would recommend using the time series extensions lag operator. Sort by Timestamp, use the Lag operator to get a new coloumn with the previous timestamp and use Generate attributes with

    if(date_diff(timestamp,timestamp-1)>XX,"jump","nojump")
    or whatever you are comforable with.


    Cheers,

    Martin

Welcome!

It looks like you're new here. Sign in or register to get started.

Welcome!

It looks like you're new here. Sign in or register to get started.