Handling timezones in storage?
The answer, as always, is "depends".
It depends on what you are describing with the time, and how the data was provided to you. The key to deciding how to store time values is deciding if you are losing information by dropping the timezone, as well as not surprising your users.
There are definite benefits in storing data in a UTC time_t - it is a single int, allowing quick sorting and easy storage.
I see the problem as being broken down into specific areas:
- Historical Data
- Future, Short Term Data
- Future, Long Term Data
With the following subclasses on each:
- System Provided
- User Provided
Let's look at them one at a time.
System Provided: I would recommend running systems in UTC, then you avoid the timezone problem and again, no information loss is seen (it's always UTC).
Historical Data: These are things like system log files, process statistics, tracing, comment dates/times, etc. The data isn't going to change, and the timezone descriptor isn't going to change retroactively. For this type of data, there is no information lost by storing the information in UTC regardless of the timezone it was provided in. So, drop the timezone.
Future, Long Term Data: These are events that are either far enough in the future or will keep happening. If they are kept around long enough, the timezone descriptors are guaranteed change. A good example of this type of data is, "The Weekly Management Meeting". This is data that is entered once, and expected to keep working into perpetuity. For these values, it is important to determine if it is system or user provided. For user-provided data, the time should be stored with the creator's timezone, anything else results in information loss. This information loss becomes apparent when the timezone definition changes and the time is displayed to the creator as having an entirely different value!
As Bwooce has indicated, there is some confusion where the creator and viewer are in different timezones. In that case, I would expect the application to indicate which time values have moved due to a timezone shift from their previous locations.
Future, Short Term Data: This is data that is quickly going to become historical, or is only valid for a short period of time. Examples could be interval timers, rating transitions, etc. For this data, since the likelihood is low that the definition will change between the creation of the value and the time it becomes historical, it might be possible to get away with dropping the timezone. However, I have found that these values have a bad habit of becoming "Future, Long Term Data".
Once you have decided to store the timezone, care must be taken with how it is stored.
- Don't store the timezone as an offset, or the full descriptor.
If you store a timezone as an offset, what do you do if the timezone changes? Do you go through the system and do a blanket change on the existing data? If you do, you've now made any historical values incorrect. Good examples of this fault are Oracle and iCal. Oracle stores timezone information as an offset from UTC, and iCal includes the full descriptor for the creation timezone.
- Do store it as a name.
This allows the definition of the timezone to change without having to modify the existing values you have. It does make sorting more difficult, since any index that is generated may be invalid if the timezone data changes.
If developers continue to store everything in UTC, irrespective of timezone, we will continue to see the problems that we've seen with the last batch of timezone changes.
At one organisation, the secretaries had to print out the calendars for their teams before the daylight savings date, and then print them out again after the change. Finally, they compared the two and re-created all of the appointments that had moved. Of course, they missed several, and there were several weeks of pain until the old daylight savings date was reached and the times became correct again.
You have to store in UTC - if you don't, your historic reporting and behaviour during things like Daylight Savings goes... funny. GMT is a local time, subject to Daylight Savings relative to UTC (which is not).
Presentation to users in different time-zones can be a real bastard if you're storing local time. It's easy to adjust to local if your raw data is in UTC - just add your user's offset and you're done!
Joel talked about this in one of the podcasts (in a round-about way) - he said to store your data in the highest resolution possible (search for 'fidelity'), because you can always munge it when it goes out again. That's why I say store it as UTC, as local time you need to adjust for anyone who's not in that timezone, and that's a lot of hard work. And you need to store whether, for example, daylight savings was in effect when you stored the time. Yuk.
Often in databases in the past I've stored two - UTC for sorting, local time for display. That way neither the user nor the computer get confused.
Now, as to display: Sure, you can do the "3 minutes ago" thing, but only if you store UTC - otherwise, data entered in different timezones is going to do things like display as "-4 hours ago", which will freak people out. If you're going to display an actual time, people love to have it in their local time - and if data's being entered in multiple timezones you can only do that with ease if you're storing UTC.
Josh is completely correct above, but I have one subtle caveat to explain. This is a case with no correct answer regarding future events and timezones.
Consider the case of a repeating appointment. It occurs at GMT 0000 (for simplicity), which is 1200 NZST (New Zealand Standard Time) and 1000 AEST in Sydney Australia.
When Daylight Savings comes into effect in one zone, what should occur to the appointment? Should it:
1a. If the TZ change is in the zone of
the appointment's "owner" (who
booked it) then attempt to remain at
the same desk clock time (eg 10:00am)?
1b. If the
TZ change is in one of the other
meeting attendee's zones then no
change
Consequences: It moves for everyone else, unexpectedly, due to the owners TZ change, but it stays "the 10am meeting" as far as the owner is concerned.
'2. As above, but reversed.
Consequences: It moves for the meeting owner (the 10am meeting becomes the 9am meeting, or v/v), which may be expected but inconvenient. It stays at the same desk clock time for the other attendees until they go through their own TZ transition.
Neither is perfect. Consider the case of two appointments, one booked by Person A that occurs at 10am local time, the other booked by Person B with Person A as an attendee that occurs at 9am. If Person A and Person B are in different TZ's then a DST change could easily cause them to become double-booked.
If your mind is a bit bent at this point then I quite understand.
The point behind this example is that to do either of these behaviors properly you need to know not just the UTC version of the local time, but the TZ (and not the offset) that the owner was in when they booked it. Otherwise you have no choice but to take option 2, silently, without even informing anyone that things have changed since GMT times don't change and only the presentation changes...right? (no, this is the trap, presentation matters when your 10am meeting moves by itself)
I have to credit my colleague and friend Jason Pollock for this insight. Read his view here, and the follow-up discussing iCal and VTIMEZONE here.