Loop through exampleset and identify a public holiday and then set all 24 hours to 1 in variable
lanem
New Altair Community Member
Hi
I am endeavouring to solve this problem with preprocessing of a data set to create a new variable that flags all 24 hours of a day as a public holiday based on identifying a particular day flagged as a public holiday but in current data set this is only for first hour of the day but technically a public holiday should be flagged for all 24 hours of a day - this is in a data set measuring impact of a number of variables on traffic volumes. Can I use a loop operator and macro to do this??? Any help or advice would be most welcome
Regards Michael
I am endeavouring to solve this problem with preprocessing of a data set to create a new variable that flags all 24 hours of a day as a public holiday based on identifying a particular day flagged as a public holiday but in current data set this is only for first hour of the day but technically a public holiday should be flagged for all 24 hours of a day - this is in a data set measuring impact of a number of variables on traffic volumes. Can I use a loop operator and macro to do this??? Any help or advice would be most welcome
Regards Michael
Tagged:
0
Best Answers
-
Hi Balázs
I am using date to nominal to reformat the date_time to date format MM/dd/yyyy and then I am using the Filter Examples operator to get the dates with days that are flagged as public holidays then as you suggest I am using join operator and doing left join on the date_time with full data set with date_time also reformatted to date format MM/dd/yyyy see rapidminer process screen capture now I am getting all dates across 24 hours for each holiday but I am not getting the full data set with other days listed as None in holiday variable regards Michael0 -
Hi,
you're sending the filtered list to the "left" input and the unfiltered one to the "right" input. But you're doing left outer join.
For this setup you need a right join. This keeps everything from the right input and the matching data from the left input.
Check out this short video on Academy:
https://academy.rapidminer.com/learn/video/joining-and-cleansing-intro
Regards,
Balázs1
Answers
-
Hi,
how are your data structured? Do you have a list of public holidays (dates) and the timestamps (on hour level)?
You could try something like this in Generate Attributes to create an additional attribute with just the date part of your timestamp:<div>date_add(timestamp_attribute, <span style="background-color: transparent; color: inherit; font-size: inherit;"> <font face="Open Sans, sans-serif">- date_get(timestamp_attribute, DATE_UNIT_HOUR), </font> </span><span style="background-color: transparent; color: inherit; font-size: inherit;"> </span><span style="background-color: transparent; color: inherit; font-size: inherit; font-family: "Open Sans", sans-serif;">DATE_UNIT_HOUR)</span></div>
Best regards,
Balázs2 -
Hi Balazs
Many thanks the data set has a date-time variable so yes can extract hour of day for days that are flagged as public holidays - holiday variable is either "None" or the name of the public holiday such as "Christmas Day" so it is a polynominal variable in RapidMiner
However the problem I have if that data set only flags the first hour of each public holiday so the other 23 hours for that holiday are incorrectly flagged as "None" so using the example above the first hour of Christmas day 25/12/2012 is flagged as "Christmas Day" the problem is that other 23 hours are not currently flagged as "Christmas Day" but rather "None".
I am thinking of using an if statement to identify a public holiday such as != "None" and use a macro and loop to set the other 23 hours to "Christmas Day" and run through the entire exampleset for each holiday could work, see snapshot of data set belowNone 275.91 0 0 64 Clouds broken clouds 10/11/2013 23:00 925 Veterans Day 275.44 0 0 64 Clouds broken clouds 11/11/2013 0:00 514 None 275.55 0 0 64 Rain light rain 11/11/2013 1:00 286 None 274.91 0 0 64 Clouds broken clouds 11/11/2013 2:00 215 None 273.9 0 0 64 Snow light snow 11/11/2013 3:00 305 None 271.36 0 0 64 Clouds broken clouds 11/11/2013 4:00 783 None 270.48 0 0 64 Clouds broken clouds 11/11/2013 5:00 2576 None 269.56 0 0 64 Clouds broken clouds 11/11/2013 6:00 5669 None 268.54 0 0 64 Clouds broken clouds 11/11/2013 7:00 6170 None 267.41 0 0 64 Clouds broken clouds 11/11/2013 8:00 5510 None 266.85 0 0 64 Clouds broken clouds 11/11/2013 9:00 4653 None 267.11 0 0 64 Clouds broken clouds 11/11/2013 10:00 4167 None 268.15 0 0 40 Clouds scattered clouds 11/11/2013 11:00 4200 None 269.27 0 0 64 Clouds broken clouds 11/11/2013 13:00 4678 None 269.71 0 0 64 Clouds broken clouds 11/11/2013 14:00 4971 None 269.2 0 0 64 Clouds broken clouds 11/11/2013 15:00 5101 None 268.68 0 0 64 Clouds broken clouds 11/11/2013 16:00 5476 None 267.63 0 0 64 Clouds broken clouds 11/11/2013 17:00 5346 None 266.68 0 0 64 Clouds broken clouds 11/11/2013 18:00 3982 None 265.77 0 0 64 Clouds broken clouds 11/11/2013 19:00 2679 None 265.14 0 0 64 Clouds broken clouds 11/11/2013 20:00 2301 None 264.93 0 0 64 Clouds broken clouds 11/11/2013 21:00 1991 None 264.44 0 0 64 Clouds broken clouds 11/11/2013 22:00 1369
Regards Michael0 -
Hi,
if you have the large table (the one with the timestamps) and create the date attribute, it will have 24 entries for each day, with the date.
Then you have a second table, I assume, with the holiday dates.
In this case you can simply join the two tables using the Join operator, selecting the date in both tables as the join attribute. This will assign the holiday to the date/time entries. Use "left join" to keep all entries of the timestamp table, then fill missing values in the resulting table with "None".
Regards,
Balázs2 -
Hi Balázs
I am using date to nominal to reformat the date_time to date format MM/dd/yyyy and then I am using the Filter Examples operator to get the dates with days that are flagged as public holidays then as you suggest I am using join operator and doing left join on the date_time with full data set with date_time also reformatted to date format MM/dd/yyyy see rapidminer process screen capture now I am getting all dates across 24 hours for each holiday but I am not getting the full data set with other days listed as None in holiday variable regards Michael0 -
Hi,
you're sending the filtered list to the "left" input and the unfiltered one to the "right" input. But you're doing left outer join.
For this setup you need a right join. This keeps everything from the right input and the matching data from the left input.
Check out this short video on Academy:
https://academy.rapidminer.com/learn/video/joining-and-cleansing-intro
Regards,
Balázs1 -
Hi Balazs many thanks this sorted out my problem although I still needed to remove duplicate attributes and replace ? with None after watching the intro video on cleansing data I also used the trim operator and remove duplicates operator really appreciate your help with this regards Michael
1