When to use VARCHAR and DATE/DATETIME
Why not put screws in with a hammer?
Because it isn't the right tool for the job.
Some of the disadvantages of the VARCHAR version:
- You can't easily add / subtract days to the VARCHAR version.
- It is harder to extract just month / year.
- There is nothing stopping you putting non-date data in the VARCHAR column in the database.
- The VARCHAR version is culture specific.
- You can't easily sort the dates.
- It is difficult to change the format if you want to later.
- It is unconventional, which will make it harder for other developers to understand.
- In many environments, using VARCHAR will use more storage space. This may not matter for small amounts of data, but in commercial environments with millions of rows of data this might well make a big difference.
Of course, in your hobby projects you can do what you want. In a professional environment I'd insist on using the right tool for the job.
Between DATE/DATETIME
and VARCHAR
for dates I would go with DATE/DATETIME
everytime. But there is a overlooked third option. Storing it as a INTEGER unsigned!
I decided to go with INTEGER unsigned
in my last project, and I am really satisfied with making that choice instead of storing it as a DATE/DATETIME
. Because I was passing along dates between client and server it made the ideal type for me to use. Instead of having to store it as DATE
and having to convert back every time I select, I just select it and use it however I want it. If you want to select the date as a "human-readable" date you can use the FROM_UNIXTIME()
function.
Also a integer takes up 4 bytes while DATETIME
takes up 8 bytes. Saving 50% storage.
The sorting problem that Berin proposes is also solved using integer as storage for dates.
When you'll have database with more than 2-3 million rows you'll know why it's better to use DATETIME than VARCHAR :)
Simple answer is that with databases - processing power isn't a problem anymore. Just the database size is because of HDD's seek time.
Basically with modern harddisks you can read about 100 records / second if they're read in random order (usually the case) so you must do everything you can to minimize DB size, because:
- The HDD's heads won't have to "travel" this much
- You'll fit more data in RAM
In the end it's always HDD's seek times that will kill you. Eg. some simple GROUP BY query with many rows could take a couple of hours when done on disk compared to couple of seconds when done in RAM => because of seek times.
For VARCHAR's you can't do any searches. If you hate the way how SQL deals with dates so much, just use unix timestamp in 32 bit integer field. You'll have (basically) all advantages of using SQL DATE field, you'll just have to manipulate and format dates using your choosen programming language, not SQL functions.
Two reasons:
- Sorting results by the dates
- Not sensitive to date formatting changes
So let's take for instance a set of records that looks like this:
5/12/1999 | Frank N Stein
1/22/2005 | Drake U. La
10/4/1962 | Goul Friend
If we were to store the data your way, but sorted on the dates in assending order SQL will respond with the resultset that looks like this:
1/22/2005 | Drake U. La
10/4/1962 | Goul Friend
5/12/1999 | Frank N. Stein
Where if we stored the dates as a DATETIME, SQL will respond correctly ordering them like this:
10/4/1962 | Goul Friend
5/12/1999 | Frank N. Stein
1/22/2005 | Drake U. La
Additionally, if somewhere down the road you needed to display dates in a different format, for example like YYYY-MM-DD, then you would need to transform all your data or deal with mixed content. When it's stored as a SQL DATE, you are forced to make the transform in code, and very likely have one spot to change the format to display all dates--for free.