I have a csv file of dates and temperatures.
For example:
dateTime, TMP
2018-08-17 06:41:46, 17.4
2018-08-17 06:42:29, 17.5
2018-08-17 06:43:12, 17.5
2018-08-17 06:43:54, 17.7
2018-08-17 06:44:37, 17.7
This data is quite dense so I'd like to down-sample it from seconds to hours and assign the mean of all values in that hour to the new feature so that it looks something like this:
dateTime, TMP.hourlyavg
2018-08-17 06:00:00, 16.9
2018-08-17 07:00:00, 17.5
2018-08-17 08:00:00, 17.6
2018-08-17 09:00:00, 18.7
2018-08-17 10:00:00, 19.7
However, I have two main constraints.
I can't use the Process Windows Operator because my data is not consistent enough (i.e. some hours might have had 15 sensor readings whereas other hours might have 12 or 20).
I also don't want to preserve the current date format instead of converting it into numeric/nominal.
So my question is, how can I downsample from seconds to hours and taking the mean value per hour given these constraints:
1. Inconsistent number of records per hour
2. Preserve date format
A secondary yet significant problem is that there are entire hours that are missing due to the sensor being taken offline for maintenance so downsampling might produce something like.
dateTime, TMP.hourlyavg
2018-08-17 06:00:00, 17.4
2018-08-17 07:00:00, 17.5
2018-08-17 08:00:00, NaN
2018-08-17 09:00:00, NaN
2018-08-17 10:00:00, 19.7
I'm hoping I'll be able to use the Replace Missing Values Operator or Moving Average Operator to replace NaNs that might occur from the downsampling process. Advice or recommendations on this step are appreciated too.
I looked at this thread but the proposed solution is too complicated for me to follow being new to RapidMiner
community(dot)rapidminer(dot)com/discussion/54821/time-series-gaps-for-arima-how-to-fill-them