Loop through exampleset and identify a public holiday and then set all 24 hours to 1 in variable
Hi
I am endeavouring to solve this problem with preprocessing of a data set to create a new variable that flags all 24 hours of a day as a public holiday based on identifying a particular day flagged as a public holiday but in current data set this is only for first hour of the day but technically a public holiday should be flagged for all 24 hours of a day - this is in a data set measuring impact of a number of variables on traffic volumes. Can I use a loop operator and macro to do this??? Any help or advice would be most welcome
Regards Michael
I am endeavouring to solve this problem with preprocessing of a data set to create a new variable that flags all 24 hours of a day as a public holiday based on identifying a particular day flagged as a public holiday but in current data set this is only for first hour of the day but technically a public holiday should be flagged for all 24 hours of a day - this is in a data set measuring impact of a number of variables on traffic volumes. Can I use a loop operator and macro to do this??? Any help or advice would be most welcome
Regards Michael
Find more posts tagged with
Sort by:
1 - 6 of
61
Hi Balazs
Many thanks the data set has a date-time variable so yes can extract hour of day for days that are flagged as public holidays - holiday variable is either "None" or the name of the public holiday such as "Christmas Day" so it is a polynominal variable in RapidMiner
However the problem I have if that data set only flags the first hour of each public holiday so the other 23 hours for that holiday are incorrectly flagged as "None" so using the example above the first hour of Christmas day 25/12/2012 is flagged as "Christmas Day" the problem is that other 23 hours are not currently flagged as "Christmas Day" but rather "None".
I am thinking of using an if statement to identify a public holiday such as != "None" and use a macro and loop to set the other 23 hours to "Christmas Day" and run through the entire exampleset for each holiday could work, see snapshot of data set below
Regards Michael
Many thanks the data set has a date-time variable so yes can extract hour of day for days that are flagged as public holidays - holiday variable is either "None" or the name of the public holiday such as "Christmas Day" so it is a polynominal variable in RapidMiner
However the problem I have if that data set only flags the first hour of each public holiday so the other 23 hours for that holiday are incorrectly flagged as "None" so using the example above the first hour of Christmas day 25/12/2012 is flagged as "Christmas Day" the problem is that other 23 hours are not currently flagged as "Christmas Day" but rather "None".
I am thinking of using an if statement to identify a public holiday such as != "None" and use a macro and loop to set the other 23 hours to "Christmas Day" and run through the entire exampleset for each holiday could work, see snapshot of data set below
None | 275.91 | 0 | 0 | 64 | Clouds | broken clouds | 10/11/2013 23:00 | 925 |
Veterans Day | 275.44 | 0 | 0 | 64 | Clouds | broken clouds | 11/11/2013 0:00 | 514 |
None | 275.55 | 0 | 0 | 64 | Rain | light rain | 11/11/2013 1:00 | 286 |
None | 274.91 | 0 | 0 | 64 | Clouds | broken clouds | 11/11/2013 2:00 | 215 |
None | 273.9 | 0 | 0 | 64 | Snow | light snow | 11/11/2013 3:00 | 305 |
None | 271.36 | 0 | 0 | 64 | Clouds | broken clouds | 11/11/2013 4:00 | 783 |
None | 270.48 | 0 | 0 | 64 | Clouds | broken clouds | 11/11/2013 5:00 | 2576 |
None | 269.56 | 0 | 0 | 64 | Clouds | broken clouds | 11/11/2013 6:00 | 5669 |
None | 268.54 | 0 | 0 | 64 | Clouds | broken clouds | 11/11/2013 7:00 | 6170 |
None | 267.41 | 0 | 0 | 64 | Clouds | broken clouds | 11/11/2013 8:00 | 5510 |
None | 266.85 | 0 | 0 | 64 | Clouds | broken clouds | 11/11/2013 9:00 | 4653 |
None | 267.11 | 0 | 0 | 64 | Clouds | broken clouds | 11/11/2013 10:00 | 4167 |
None | 268.15 | 0 | 0 | 40 | Clouds | scattered clouds | 11/11/2013 11:00 | 4200 |
None | 269.27 | 0 | 0 | 64 | Clouds | broken clouds | 11/11/2013 13:00 | 4678 |
None | 269.71 | 0 | 0 | 64 | Clouds | broken clouds | 11/11/2013 14:00 | 4971 |
None | 269.2 | 0 | 0 | 64 | Clouds | broken clouds | 11/11/2013 15:00 | 5101 |
None | 268.68 | 0 | 0 | 64 | Clouds | broken clouds | 11/11/2013 16:00 | 5476 |
None | 267.63 | 0 | 0 | 64 | Clouds | broken clouds | 11/11/2013 17:00 | 5346 |
None | 266.68 | 0 | 0 | 64 | Clouds | broken clouds | 11/11/2013 18:00 | 3982 |
None | 265.77 | 0 | 0 | 64 | Clouds | broken clouds | 11/11/2013 19:00 | 2679 |
None | 265.14 | 0 | 0 | 64 | Clouds | broken clouds | 11/11/2013 20:00 | 2301 |
None | 264.93 | 0 | 0 | 64 | Clouds | broken clouds | 11/11/2013 21:00 | 1991 |
None | 264.44 | 0 | 0 | 64 | Clouds | broken clouds | 11/11/2013 22:00 | 1369 |
Regards Michael
Hi,
if you have the large table (the one with the timestamps) and create the date attribute, it will have 24 entries for each day, with the date.
Then you have a second table, I assume, with the holiday dates.
In this case you can simply join the two tables using the Join operator, selecting the date in both tables as the join attribute. This will assign the holiday to the date/time entries. Use "left join" to keep all entries of the timestamp table, then fill missing values in the resulting table with "None".
Regards,
Balázs
if you have the large table (the one with the timestamps) and create the date attribute, it will have 24 entries for each day, with the date.
Then you have a second table, I assume, with the holiday dates.
In this case you can simply join the two tables using the Join operator, selecting the date in both tables as the join attribute. This will assign the holiday to the date/time entries. Use "left join" to keep all entries of the timestamp table, then fill missing values in the resulting table with "None".
Regards,
Balázs
Hi Balázs
I am using date to nominal to reformat the date_time to date format MM/dd/yyyy and then I am using the Filter Examples operator to get the dates with days that are flagged as public holidays then as you suggest I am using join operator and doing left join on the date_time with full data set with date_time also reformatted to date format MM/dd/yyyy see rapidminer process screen capture now I am getting all dates across 24 hours for each holiday but I am not getting the full data set with other days listed as None in holiday variable regards Michael
I am using date to nominal to reformat the date_time to date format MM/dd/yyyy and then I am using the Filter Examples operator to get the dates with days that are flagged as public holidays then as you suggest I am using join operator and doing left join on the date_time with full data set with date_time also reformatted to date format MM/dd/yyyy see rapidminer process screen capture now I am getting all dates across 24 hours for each holiday but I am not getting the full data set with other days listed as None in holiday variable regards Michael

Hi,
you're sending the filtered list to the "left" input and the unfiltered one to the "right" input. But you're doing left outer join.
For this setup you need a right join. This keeps everything from the right input and the matching data from the left input.
Check out this short video on Academy:
https://academy.rapidminer.com/learn/video/joining-and-cleansing-intro
Regards,
Balázs
you're sending the filtered list to the "left" input and the unfiltered one to the "right" input. But you're doing left outer join.
For this setup you need a right join. This keeps everything from the right input and the matching data from the left input.
Check out this short video on Academy:
https://academy.rapidminer.com/learn/video/joining-and-cleansing-intro
Regards,
Balázs
Hi Balazs many thanks this sorted out my problem although I still needed to remove duplicate attributes and replace ? with None after watching the intro video on cleansing data I also used the trim operator and remove duplicates operator really appreciate your help with this regards Michael
Sort by:
1 - 2 of
21
Hi Balázs
I am using date to nominal to reformat the date_time to date format MM/dd/yyyy and then I am using the Filter Examples operator to get the dates with days that are flagged as public holidays then as you suggest I am using join operator and doing left join on the date_time with full data set with date_time also reformatted to date format MM/dd/yyyy see rapidminer process screen capture now I am getting all dates across 24 hours for each holiday but I am not getting the full data set with other days listed as None in holiday variable regards Michael
I am using date to nominal to reformat the date_time to date format MM/dd/yyyy and then I am using the Filter Examples operator to get the dates with days that are flagged as public holidays then as you suggest I am using join operator and doing left join on the date_time with full data set with date_time also reformatted to date format MM/dd/yyyy see rapidminer process screen capture now I am getting all dates across 24 hours for each holiday but I am not getting the full data set with other days listed as None in holiday variable regards Michael

Hi,
you're sending the filtered list to the "left" input and the unfiltered one to the "right" input. But you're doing left outer join.
For this setup you need a right join. This keeps everything from the right input and the matching data from the left input.
Check out this short video on Academy:
https://academy.rapidminer.com/learn/video/joining-and-cleansing-intro
Regards,
Balázs
you're sending the filtered list to the "left" input and the unfiltered one to the "right" input. But you're doing left outer join.
For this setup you need a right join. This keeps everything from the right input and the matching data from the left input.
Check out this short video on Academy:
https://academy.rapidminer.com/learn/video/joining-and-cleansing-intro
Regards,
Balázs
how are your data structured? Do you have a list of public holidays (dates) and the timestamps (on hour level)?
You could try something like this in Generate Attributes to create an additional attribute with just the date part of your timestamp:
Balázs