Why does a correctly formatted Excel date import as a long row of nominal numbers?
SkyTrader
New Altair Community Member
Hi there,
Wondering why my correctly formatted (short date) in Excel (dd/mm/yyyy) imports as a long row of nominal numbers when using xlsx?
I've seen a huge amount of coverage on this but nothing that addresses the reason why a string of (meaningless to me!) nominal numbers would somehow be easier to interpret? Particularly if your reason for using RM is to predict the future price of assets, i.e. so when looking within Data you'd want to see dates not a string of number. I did try the operator to converts nomiunal to date but it did nothing and I still was left with nominal numbers?
I've only been able to get dates to appear if I first convert my xlsx into a csv file first. What are the reasons/advantages/disadvantages to either importing an xlsx file or a csv into RM (apart from, in the case of a csv, to be able to see actual dates)?
Cheers,
Wondering why my correctly formatted (short date) in Excel (dd/mm/yyyy) imports as a long row of nominal numbers when using xlsx?
I've seen a huge amount of coverage on this but nothing that addresses the reason why a string of (meaningless to me!) nominal numbers would somehow be easier to interpret? Particularly if your reason for using RM is to predict the future price of assets, i.e. so when looking within Data you'd want to see dates not a string of number. I did try the operator to converts nomiunal to date but it did nothing and I still was left with nominal numbers?
I've only been able to get dates to appear if I first convert my xlsx into a csv file first. What are the reasons/advantages/disadvantages to either importing an xlsx file or a csv into RM (apart from, in the case of a csv, to be able to see actual dates)?
Cheers,
0
Best Answers
-
This is difficult to diagnose without seeing that data and the specific formatting problem. I am able to import a formatted date into RapidMiner from Excel in an xlsx spreadsheet without a problem. Did you make sure the date format parameter in the Read Excel operator is correctly set to your format?
Sometimes if RapidMiner has problems with automatic imports it is better to simply set the field to be a polynominal upon import (which will not transform it) and then use one of the date transformation operators in your ETL process, which will give you more control over how it is interpreted.5 -
Without seeing your data and the configuration of your RapidMiner process it is impossible to diagnose what may be happening here. As I mentioned above, you should make sure that the date format parameter actually matches up with the way the date is showing in Excel as the default option may not be a correct match.
In terms of general points, the advantages of importing an xlsx file are that there is a lot of embedded metadata regarding data types and such that will be read automatically by RapidMiner, which can save you time and hassle in your subsequent ETL. It is also the case that a lot of data in the real world natively exists in Excel workbooks. The disadvantage is that sometimes the extra encoding can lead to errors or unexpected outcomes (like the problem you are trying to troubleshoot now).
The advantages of a csv is that it is basically just a text file with minimal additional encoding, so you don't usually have the same potential for things to go awry. In fact you can choose to import the entire file as polynominal data and then deal with all your subsequent data conversions manually within RapidMiner using the ETL operators which gives you a much finer degree of control. The disadvantage is that you might have to convert your original data source into a csv file first if it originated in Excel or some other format, and that doing that extra data ETL for conversions, etc. inside RapidMiner can be a bit of a pain.5
Answers
-
This is difficult to diagnose without seeing that data and the specific formatting problem. I am able to import a formatted date into RapidMiner from Excel in an xlsx spreadsheet without a problem. Did you make sure the date format parameter in the Read Excel operator is correctly set to your format?
Sometimes if RapidMiner has problems with automatic imports it is better to simply set the field to be a polynominal upon import (which will not transform it) and then use one of the date transformation operators in your ETL process, which will give you more control over how it is interpreted.5 -
Thanks Telecontar120, I'll take another look when I've finished the current project I'm working on.0
-
Oddly I can now import xlsx files today using the repository/import data and the dates show as Jan 3, 2000 now. I don't know why as I haven't changed the Excel file (I just used the original extension I always use which is xlsx)?
What are the reasons/advantages/disadvantages to either importing an xlsx file or a csv into RM?
Cheers,
0 -
Without seeing your data and the configuration of your RapidMiner process it is impossible to diagnose what may be happening here. As I mentioned above, you should make sure that the date format parameter actually matches up with the way the date is showing in Excel as the default option may not be a correct match.
In terms of general points, the advantages of importing an xlsx file are that there is a lot of embedded metadata regarding data types and such that will be read automatically by RapidMiner, which can save you time and hassle in your subsequent ETL. It is also the case that a lot of data in the real world natively exists in Excel workbooks. The disadvantage is that sometimes the extra encoding can lead to errors or unexpected outcomes (like the problem you are trying to troubleshoot now).
The advantages of a csv is that it is basically just a text file with minimal additional encoding, so you don't usually have the same potential for things to go awry. In fact you can choose to import the entire file as polynominal data and then deal with all your subsequent data conversions manually within RapidMiner using the ETL operators which gives you a much finer degree of control. The disadvantage is that you might have to convert your original data source into a csv file first if it originated in Excel or some other format, and that doing that extra data ETL for conversions, etc. inside RapidMiner can be a bit of a pain.5