Hi Fellas,
I am new to RapidMiner.
The very first task I am trying to do requires identification of the earliest date from a large dataset containing 8 date attributes and few thousands of examples. Due to low quality data, some date attributes missing, some mistyped.
So far, I created a process, that is identifying the earliest date from the 8 and generates a new attribute called ’Earliest date’. Now, I have 8 date attributes, but will use only the 9th as nominal value for further transformation of the dataset.
Before doing that, I would need to filter the wrong values from the 8 attributes such as -1 or dates in the future like 2071 and so on. Without proper cleansing the 9th attribute is wrong is few cases returning 1899 and similar.
Is there any way to filter out the dates preferably without repeating the same operator 8 times? Outliers for dates or something close to that? I am not familiar with macros yet, but perhaps that is the missing piece…
Thanks!