Correctness of Sakamoto's algorithm to find the day of week
Well, you can tell just by looking at it that it is correct... Assuming that the t[]
array is correct, which you can verify with just 12 spot checks (one for each month using any day/year).
The y -= m < 3
is a nice trick. It creates a "virtual year" that starts on March 1 and ends on February 28 (or 29), putting the extra day (if any) at the end of the year; or rather, at the end of the previous year. So for example, virtual year 2011 began on Mar 1 and will end on February 29, while virtual year 2012 will begin on March 1 and end on the following February 28.
By putting the added day for leap years at the end of the virtual year, the rest of the expression is massively simplified.
Let's look at the sum:
(y + y/4 - y/100 + y/400 + t[m-1] + d) % 7
There are 365 days in a normal year. That is 52 weeks plus 1 day. So the day of the week shifts by one day per year, in general. That is what the y
term is contributing; it adds one to the day for each year.
But every four years is a leap year. Those contribute an extra day every four years. Thanks to the use of virtual years, we can just add y/4
to the sum to count how many leap days happen in y
years. (Note that this formula assumes integer division rounds down.)
But that is not quite right, because every 100 years is not a leap year. So we have to subtract off y/100
.
Except that every 400 years is a leap year again. So we have to add y/400
.
Finally we just add the day of the month d
and an offset from a table that depends on the month (because the month boundaries within the year are fairly arbitrary).
Then take the whole thing mod 7 since that is how long a week is.
(If weeks were eight days, for example, what would change in this formula? Well, it would be mod 8, obviously. Also the y
would need to be 5*y
, because 365 % 8 == 5. Also the month table t[]
would need adjusting. That's it.)
Incidentally, Wikipedia's statement that the calendar is "good until 9999" is totally arbitrary. This formula is good for however long we stick with the Gregorian calendar, whether that is 10 years, 100 years, 1000 years, or 1 million years.
[edit]
The above argument is essentially a proof by induction. That is, assuming that the formula works for a particular (y,m,d), you prove that it works for (y+1,m,d) and (y,m,d+1). (Where y is a "virtual year" starting March 1.) So the key question is, does the sum change by the correct amount as you move from one year to the next? With knowledge of the leap year rules, and with the "virtual year" having the extra day at year end, it trivially does.
Recently I wrote blog post about this algorithm here.
The basic idea behind algorithm is to for February and January to count day
of week from 31 Dec of the previous year. For all other months we will be counting day of week from current year 31 Dec. We do this in two
steps first we compute day of week of last day of month preceding current month m
then we just add d
modulo seven.
31 Dec 1 BC is Sunday that is encoded as 0, Monday is 1 etc.
So we have: 0 + y + y/4 - y/100 + y/400
this with y -= m < 3
computes
day of week of 31 Dec of current year or previous year (depending on month). Note: 365 % 7 == 1
this explains why we wrote y
instead of 365*y
. The last component d
is obvious since we start counting day of week from previous month last day.
The last part that need to be explained are values in array, for first two values these are number of days since last year 31 Dec to start of the month % 7
. For rest of the months they are negated modulo seven number of days from end of prev month to 31 Dec of the current year. In other words we are subtracting days by addition modulo 7 e.g. (a-b)%7 = (a+(7-b%7))%7
.
More explanation you may find in my blog post.