Data transformation

SagioProject
New Altair Community Member
Hi everybody,
For a university project I need to transform a dataset, but I really don't know how to do it in RapidMiner. Could you help me out?
The dataset contains event logs captured from a website. Attributes are timestamp, ip_address, browser_info and some other less important ones.
I generated a new attribute Date, which is only the date, without time.
Then I generated a new attribute Session_ID by concatenating the Date, ip_address and browser_info attributes.
The examples with the same Session_ID are events that occurred on the same day, by the same ip address and by the same browser.
What I now want to do is to split up these sessions. If there is a gap between 2 successive events of 30 minutes or more, I want them to be splitted in 2 different groups. I want to do that by generating a new attribute Session_in_day, which can be 1, 2, 3, ... according to the "smaller session" this example is in.
In MatLab I was more or less able to write a program to do this, but I have no clue how to do this in RapidMiner. Anyone?
For a university project I need to transform a dataset, but I really don't know how to do it in RapidMiner. Could you help me out?
The dataset contains event logs captured from a website. Attributes are timestamp, ip_address, browser_info and some other less important ones.
I generated a new attribute Date, which is only the date, without time.
Then I generated a new attribute Session_ID by concatenating the Date, ip_address and browser_info attributes.
The examples with the same Session_ID are events that occurred on the same day, by the same ip address and by the same browser.
What I now want to do is to split up these sessions. If there is a gap between 2 successive events of 30 minutes or more, I want them to be splitted in 2 different groups. I want to do that by generating a new attribute Session_in_day, which can be 1, 2, 3, ... according to the "smaller session" this example is in.
In MatLab I was more or less able to write a program to do this, but I have no clue how to do this in RapidMiner. Anyone?
Tagged:
0
Answers
-
Hi
you can create a new attribute, which indicates whether there was a pause or not.
To do so i would recommend using the time series extensions lag operator. Sort by Timestamp, use the Lag operator to get a new coloumn with the previous timestamp and use Generate attributes with
or whatever you are comforable with.
if(date_diff(timestamp,timestamp-1)>XX,"jump","nojump")
Cheers,
Martin0