Hi,
I'd like to report a bug in conversion of date data type from Python to RM dataset:
1. Dataset contains column 'Date' with type date and role 'id'. The column behaves as date should (sorting, plotting, extracting max date with macro).
2. The column is properly converted to python's pandas - dtype of the column is datetime64[ns], all the common pandas operations with datetime data work as expected. Also metadata for that column seems correct: 'Date': ('date', 'id')
3. The column contains only missing values after passing back to RapidMiner (number of examples is the same, but all the values in this column are missing). Even if the code is only:
import pandas
def rm_main(data):
return data
I went through the documentation to the extension to find out if there is anything specific about dates, but it does not seem so. I also created a completely new, empty project with a trivial dataset (just two columns, one for date and one for dummy data) but the data in the date column are always missing when received by RM.
I'm just exploring RapidMiner, so I'm sorry if missing something obvious. But if conversion from RM to python works, it seems to me that the opposite should work as well.
I'm using python 3.5.3, pandas 0.19.2 and numpy 1.12.0 from Anaconda on mac.
RapidMiner Studio 7.4.000
Python Scripting 7.4.0
Best regards
Jiri
Update
The conversion from pandas to dataset works when the date column is made timezone-aware by tz_localize in the python script.
It seems from my experiments that all date and date_time types in RapidMiner are timezone aware no matter what, so would it be possible to localize them right away in conversion from dataset into pandas?
One suggestion in the end - why not to create a naive, timezone unaware date type in RapidMiner?
Best regards
Jiri