Date importing in different time zones
mseeds
New Altair Community Member
My team and I sit in different time zones. I created a data set on our server where the first row reads Mar 1, 2019. Because I provided no time, it sets my time as midnight. My time zone is in Texas, CST. If I copy and paste that value into a notepad it comes in as:
My teammate sits in California and the date is displayed to him as Feb 28, 2019. His timezone is 2 hours behind mine. When he copies and pastes it reads
We are running into other date-time issues in joins because of this based on who imported the file to their system, appended or joined, and then shared the results. We have data sets where the date is displaying as off by one day for one of us, or having problems joining on date because of the 2 hour time difference.
What is the best practice for importing dates when you know you'll be working across time zones? We are currently exploring a workaround to import the dates as nominals and then use the community building block to transform them into RapidMiner dates, but that feels like a clumsy solution.
Thanks,
Maggie
@sgenzer
Fri Mar 01 00:00:00 CST 2019 |
Thu Feb 28 22:00:00 PST 2019
We are running into other date-time issues in joins because of this based on who imported the file to their system, appended or joined, and then shared the results. We have data sets where the date is displaying as off by one day for one of us, or having problems joining on date because of the 2 hour time difference.
What is the best practice for importing dates when you know you'll be working across time zones? We are currently exploring a workaround to import the dates as nominals and then use the community building block to transform them into RapidMiner dates, but that feels like a clumsy solution.
Thanks,
Maggie
@sgenzer
Tagged:
0
Best Answers
-
Hello, @mseeds!Yes, I have a few ideas, but not quite good ones. If you want, we can iterate to figure out the best solution for you.Working with dates is quite painful (it's not RapidMiner's fault, though... I have seen very strange things with PostgreSQL as well) and I normally end up generating a new attribute named "Nominal Date" or so, with the following structure:
concat(<br> str(date_get(Date, DATE_UNIT_YEAR)),<br> "-",<br> if (<br> date_get(Date, DATE_UNIT_MONTH) + 1 < 10,<br> concat("0", str(date_get(Date, DATE_UNIT_MONTH) + 1)),<br> str(date_get(Date, DATE_UNIT_MONTH) + 1)<br> ),<br> "-",<br> if (<br> date_get(Date, DATE_UNIT_DAY) < 10,<br> concat("0", str(date_get(Date, DATE_UNIT_DAY))),<br> str(date_get(Date, DATE_UNIT_DAY))<br> )<br>)
I attached the process that showcases this, if anyone wants to make a building block from this, I'll happily share the other 23 tricks I have for date handling, because RapidMiner does not handle dates (in the sense of just dates).Now, let me just write about my frustration just because I was battling with something similar literally an hour ago. I apologize in advance.Prior to the IEEE 754 standard (and others), one could define a date with what's called a data picture, so defining a date could be defined as 9999-12-31, and this could save any kind of occidental date. By the way, the Y2K problem that happened 19 years ago was in part because COBOL developers could define dates as 99-12-31 in order to save precious bytes on their computers. Well, despite having a few standards for data types, no such thing exists for dates (except for the ICU project, but that is more an implementation to overcome the vast amount of quirks that exist with handling dates). Having this in mind, language implementation designers defined two kinds of dates: one is called the EPOCH integer (counted since January 1, 1970 UTC), and the other is the timestamp structure.Now, RapidMiner handles both EPOCH and the full timestamp structure, but it seems that on its implementation there is no distinction between a datetime, a date, a time, a datetime with timezone, a time with timezone or a timestamp internally. All of these are stored as timestamps with timezone (simpler for calculations but working across many timezones is as painful as hitting the corner of a wall with the small finger of your foot) but shown as required, using the data picture concept we described above.Your suggestion of working with dates as nominals is one possible case, but it all depends on what is your use case: perhaps working with a network of call centers require timestamps of calls to be registered with timezone, while certain processes are valid just for months of a year, quarters of a year and some more things.@mseeds let me know if the solution here suits you. If not, we can iterate over a few things and figure out how to help you.All the best,Rod.1 -
Hi,
Lovely topic! Working with dates is always fun, especially with things like DaylightSavingTime, switching seconds, weird places that sometimes skip entire days (because why not?), etc etc etc. There are so many oddities and exceptions all around this topic, it's ludicrous. If we just had UTC and everywhere on earth it would be exactly the same time with no irregularities and exceptions, the world would be a much better place and millions of hours or productivity would have been and could be saved..
Anyhow, I digress. So here's the deal: Internally, all datetimes are stored as milliseconds from epoch (see https://currentmillis.com/). So if you have a datetime column, the actual data will be identical for you and your colleagues, no matter what local timezones you have. However, because humans like their own timezone which they are familiar with, we display dates in your local timezone (or the one you have selected in the preferences of Studio). That however is just the display, the actual value in the data is still milliseconds from epoch.
Joining on that is not a problem, as again, the milliseconds from epoch are identical.
However you can start running into problems if you're doing some conversions to new date objects or strings, and if on the way the timezone (UTC) gets silently lost due to a misconfiguration or a bug, then for example the system timezone might be assumed and it all goes to hell.
Also if you both at the same time create a date from "now()", it would be the exact same value in the data, no matter where in the world you are - with one notable problem: It uses the system clock to do so. If the system clock is out of sync, the previous statement is no longer true.
Regards,
Marco3 -
Hi Maggie,I would never transform the original data because it would be problematic to actually see where the data is generated. Instead, I would generate new attributes for my particular problems. Generating the date as nominal and parse when it's convenient with the Nominal to Date operator, keeping the original attribute seems the right thing to do, even if it looks like a bandaid.Let me see if I can send you a few examples. I'm in the middle of a meeting now.All the best,Rodrigo.5
-
Marco's suggestion is the same approuch we take (multi regional data with local time flavor). So date to numeric gives us a unified format, and if we need to do date manipulation later onwards it's just a matter of numeric to date again.
Much safer as using nominal conversion and start fooling around with that.2
Answers
-
Hello @mseeds
You can check time zone in Settings --> Preferences --> General. I think it would be better to follow a standard time like GMT or UTC so that system time doesn't affect your process (or) people in different time zones can set their studio time zone similar to server time zone. But, if you change something on the server, then change it everywhere (Studio).
Here is a detailed explanation:
https://community.rapidminer.com/discussion/36312/change-system-time
@rfuentealba your insights are helpful here.
Hope this helps.
0 -
Hello, @mseeds!Yes, I have a few ideas, but not quite good ones. If you want, we can iterate to figure out the best solution for you.Working with dates is quite painful (it's not RapidMiner's fault, though... I have seen very strange things with PostgreSQL as well) and I normally end up generating a new attribute named "Nominal Date" or so, with the following structure:
concat(<br> str(date_get(Date, DATE_UNIT_YEAR)),<br> "-",<br> if (<br> date_get(Date, DATE_UNIT_MONTH) + 1 < 10,<br> concat("0", str(date_get(Date, DATE_UNIT_MONTH) + 1)),<br> str(date_get(Date, DATE_UNIT_MONTH) + 1)<br> ),<br> "-",<br> if (<br> date_get(Date, DATE_UNIT_DAY) < 10,<br> concat("0", str(date_get(Date, DATE_UNIT_DAY))),<br> str(date_get(Date, DATE_UNIT_DAY))<br> )<br>)
I attached the process that showcases this, if anyone wants to make a building block from this, I'll happily share the other 23 tricks I have for date handling, because RapidMiner does not handle dates (in the sense of just dates).Now, let me just write about my frustration just because I was battling with something similar literally an hour ago. I apologize in advance.Prior to the IEEE 754 standard (and others), one could define a date with what's called a data picture, so defining a date could be defined as 9999-12-31, and this could save any kind of occidental date. By the way, the Y2K problem that happened 19 years ago was in part because COBOL developers could define dates as 99-12-31 in order to save precious bytes on their computers. Well, despite having a few standards for data types, no such thing exists for dates (except for the ICU project, but that is more an implementation to overcome the vast amount of quirks that exist with handling dates). Having this in mind, language implementation designers defined two kinds of dates: one is called the EPOCH integer (counted since January 1, 1970 UTC), and the other is the timestamp structure.Now, RapidMiner handles both EPOCH and the full timestamp structure, but it seems that on its implementation there is no distinction between a datetime, a date, a time, a datetime with timezone, a time with timezone or a timestamp internally. All of these are stored as timestamps with timezone (simpler for calculations but working across many timezones is as painful as hitting the corner of a wall with the small finger of your foot) but shown as required, using the data picture concept we described above.Your suggestion of working with dates as nominals is one possible case, but it all depends on what is your use case: perhaps working with a network of call centers require timestamps of calls to be registered with timezone, while certain processes are valid just for months of a year, quarters of a year and some more things.@mseeds let me know if the solution here suits you. If not, we can iterate over a few things and figure out how to help you.All the best,Rod.1 -
Hi,
Lovely topic! Working with dates is always fun, especially with things like DaylightSavingTime, switching seconds, weird places that sometimes skip entire days (because why not?), etc etc etc. There are so many oddities and exceptions all around this topic, it's ludicrous. If we just had UTC and everywhere on earth it would be exactly the same time with no irregularities and exceptions, the world would be a much better place and millions of hours or productivity would have been and could be saved..
Anyhow, I digress. So here's the deal: Internally, all datetimes are stored as milliseconds from epoch (see https://currentmillis.com/). So if you have a datetime column, the actual data will be identical for you and your colleagues, no matter what local timezones you have. However, because humans like their own timezone which they are familiar with, we display dates in your local timezone (or the one you have selected in the preferences of Studio). That however is just the display, the actual value in the data is still milliseconds from epoch.
Joining on that is not a problem, as again, the milliseconds from epoch are identical.
However you can start running into problems if you're doing some conversions to new date objects or strings, and if on the way the timezone (UTC) gets silently lost due to a misconfiguration or a bug, then for example the system timezone might be assumed and it all goes to hell.
Also if you both at the same time create a date from "now()", it would be the exact same value in the data, no matter where in the world you are - with one notable problem: It uses the system clock to do so. If the system clock is out of sync, the previous statement is no longer true.
Regards,
Marco3 -
Hi Rod -
Thanks for all the background info! For my use case, I only need the date value as I'm aggregating rows to the date level. To clarify, is your suggestion to transform all dates to nominal and not use date types? That seems to be a bit of a bandaid since I want to be able to use date functions (e.g. extract day of week).
rfuentealba
0 -
Hi Maggie,I would never transform the original data because it would be problematic to actually see where the data is generated. Instead, I would generate new attributes for my particular problems. Generating the date as nominal and parse when it's convenient with the Nominal to Date operator, keeping the original attribute seems the right thing to do, even if it looks like a bandaid.Let me see if I can send you a few examples. I'm in the middle of a meeting now.All the best,Rodrigo.5
-
Marco's suggestion is the same approuch we take (multi regional data with local time flavor). So date to numeric gives us a unified format, and if we need to do date manipulation later onwards it's just a matter of numeric to date again.
Much safer as using nominal conversion and start fooling around with that.2