ValiDate ISO 8601 by RX
PCRE: 603 940 947 949 956 bytes
^\s*[+-]?(\d{4,10}-((00[1-9]|0[1-9]\d|[12]\d\d|3[0-5]\d|36[0-5])|(0[1-9]|1[0-2])-(0[1-9]|1\d|2[0-8])|(0[13-9]|1[0-2])-(29|30)|(0[13578]|1[02])-31|W(0[1-9]|[1-4]\d|5[0-2])-[1-7]))|((\d{2,8}([13579][26]|[2468][048]|0[48])|(\d{0,6}([13579][26]|[02468][048])00))-(366|02-29))|(\+?\d{0,6}(([02468][048]|[13579][26])([26]0|71|[38]2|[49]3|[05]4|15|[27]6|37|[48]8|[09]9)|([02468][159]|[13579][37])(50|[16]1|[27]2|33|[48]4|[09]5|[15]6|67|[27]8|[38]9)|([02468][26]|[13579][048])([48]0|[09]1|[15]2|63|[27]4|[38]5|[49]6|[05]7|[16]8|29)|([02468][37]|[13579][159])([27]0|[38]1|[49]2|[05]3|[16]4|25|[37]6|87|[049]8|[5]9))|-\d{0,6}(([02468][048]|[13579][26])(0[28]|1[39]|24|3[06]|4[17]|5[28]|6[49]|75|8[06]|9[27])|([02468][159]|[13579][37])(0[49]|15|2[06]|3[27]|4[38]|54|6[05]|7[16]|8[28]|9[39])|([02468][26]|[13579][048])(0[51]|16|2[28]|3[39]|44|5[06]|6[17]|7[28]|8[49]|95)|([02468][37]|[13579][159])(0[17]|1[28]|2[49]|35|4[06]|5[27]|6[38]|74|8[05]|9[16])))-W53-[1-7]\s*$
Note: Some pairs of parentheses could possibly be dropped.
- Test valid dates and formats
- Test invalid dates
- Test invalid formats
Divisibility by 4
The multiples of 4 repeat in a simple pattern:
- 00, 04, 08, 12, 16,
20, 24, 28, 32, 36,
40, 44, 48, 52, 56,
60, 64, 68, 72, 76,
80, 84, 88, 92, 96, …
This, or the inverse, could be matched by a likewise simple regular expression for all two-digit numbers with leading zero:
(?<divisible-by-four>[13579][26]|[02468][048])
(?<not-divisible-by-four>[13579][048]|[02468][26]|\d[13579])
It could save some bytes if there were character classes for odd and even digits (like \o
and \e
), but there are not as far as I’m aware.
Years
That expression would suffice for the Julian calendar, but the Gregorian leap year detection needs to special-case trailing 00
with century divisibility by 4:
(?<leap-year>[+-]?(\d{2,8}([13579][26]|[2468][048]|0[48])|(\d{0,6}([13579][26]|[02468][048])00))
(?<year>[+-]?\d{4,10})
This would need some changes to outlaw -0000-…
(along with -00000-…
etc.) or to enforce the plus sign for positive year numbers with more than 4 digits. The latter would be rather simple, but is not required:
(?<leap-year>([+-]?(\d\d([13579][26]|[2468][048]|0[48])|(([13579][26]|[02468][048])00)))|([+-](\d{3,8}([13579][26]|[2468][048]|0[48])|(\d{1,6}([13579][26]|[02468][048])00))))
(?<year>([+-]?\d{4})|([+-]\d{5,10}))
Day of year
Three-digit ordinal dates are rather simple, we just have to restrict -366
to leap years (and disallow -000
).
(?<ordinal-day>-(00[1-9]|0[1-9]\d|[12]\d\d|3[0-5]\d|36[0-5]))
(?<ordinal-leap-day>-366)
Day of month of year
The seven months with 31 days are 01
January, 03
March, 05
May, 07
July, 08
August, 10
October and 12
December. Just four month have exactly 30 days, 04
April, 06
June, 09
September and 11
November. Finally, 02
February has 28 days in common years and 29 in leap years. We can first construct a regular expression for the always valid days 01
through 28
and then add special cases.
(?<month-day>-(0[1-9]|1[0-2])-(0[1-9]|1\d|2[0-8]))
(?<short-month-day>-(0[13-9]|1[0-2])-(29|30))
(?<long-month-day>-(0[13578]|1[02])-31)
(?<month-leap-day>-02-29)
Neither month nor day must be 00
which was not covered by an earlier version.
Day of week of year
All years include 52 weeks
(?<week-day>-W(0[1-9]|[1-4]\d|5[0-2])-[1-7])
Long years that include -W53
repeat in a 400-year cycle, e.g. add 2000 for the current cycle and find the current year in the third entry:
- 004, 009, 015, 020, 026, 032, 037, 043, 048, 054, 060, 065, 071, 076, 082, 088, 093, 099,
- 105, 111, 116, 122, 128, 133, 139, 144, 150, 156, 161, 167, 172, 178, 184, 189, 195,
- 201, 207, 212, 218, 224, 229, 235, 240, 246, 252, 257, 263, 268, 274, 280, 285, 291, 296,
- 303, 308, 314, 320, 325, 331, 336, 342, 348, 353, 359, 364, 370, 376, 381, 387, 392, 398.
Each of the four centuries has a unique pattern. There is probably not much room for optimization.
04|09|15|20|26|32|37|43|48|54|60|65|71|76|82|88|93|99
05|11|16|22|28|33|39|44|50|56|61|67|72|78|84|89|95
01|07|12|18|24|29|35|40|46|52|57|63|68|74|80|85|91|96
03|08|14|20|25|31|36|42|48|53|59|64|70|76|81|87|92|98
We can group by either digit to find out that we can save two bytes or so:
- Grouped by 1st digit.
0[49]|15|2[06]|3[27]|4[38]|54|6[05]|7[16]|8[28]|9[39]
05|1[16]|2[28]|3[39]|44|5[06]|6[17]|7[28]|8[49]|95
0[17]|1[28]|2[49]|35|4[06]|5[27]|6[38]|74|8[05]|9[16]
0[38]|14|2[05]|3[16]|4[28]|5[39]|64|7[06]|8[17]|9[28]
- Grouped by 2nd digit.
[26]0|71|[38]2|[49]3|[05]4|15|[27]6|37|[48]8|[09]9
50|[16]1|[27]2|33|[48]4|[09]5|[15]6|67|[27]8|[38]9
[48]0|[09]1|[15]2|63|[27]4|[38]5|[49]6|[05]7|[16]8|29
[27]0|[38]1|[49]2|[05]3|[16]4|25|[37]6|87|[049]8|[5]9
The century number is easily matched again by a variation of the divisibility expression.
- 1st century:
[02468][048]|[13579][26]
- 2nd century:
[02468][159]|[13579][37]
- 3rd century:
[02468][26]|[13579][048]
- 4th century:
[02468][37]|[13579][159]
So far, this does only work for positive years, including year zero. For negative years, we have to subtract the values from the list above from 400 and do the rest again, because the pattern is not symmetric.
02|08|13|19|24|30|36|41|47|52|58|64|69|75|80|86|92|97
04|09|15|20|26|32|37|43|48|54|60|65|71|76|82|88|93|99
05|11|16|22|28|33|39|44|50|56|61|67|72|78|84|89|95
01|07|12|18|24|29|35|40|46|52|57|63|68|74|80|85|91|96
or
0[28]|1[39]|24|3[06]|4[17]|5[28]|6[49]|75|8[06]|9[27]
0[49]|15|2[06]|3[27]|4[38]|54|6[05]|7[16]|8[28]|9[39]
0[51]|16|2[28]|3[39]|44|5[06]|6[17]|7[28]|8[49]|95
0[17]|1[28]|2[49]|35|4[06]|5[27]|6[38]|74|8[05]|9[16]
Putting it all together
Any year
[+-]?\d{4,10}-((00[1-9]|0[1-9]\d|[12]\d\d|3[0-5]\d|36[0-5])|(0[1-9]|1[0-2])-(0[1-9]|1\d|2[0-8])|(0[13-9]|1[0-2])-(29|30)|(0[13578]|1[02])-31|W(0[1-9]|[1-4]\d|5[0-2])-[1-7])
Leap-day year additions
[+-]?(\d{2,8}([13579][26]|[2468][048]|0[48])|(\d{0,6}([13579][26]|[02468][048])00))-(366|02-29)
Leap-week year additions
+?\d{0,6}(([02468][048]|[13579][26])([26]0|71|[38]2|[49]3|[05]4|15|[27]6|37|[48]8|[09]9)|([02468][159]|[13579][37])(50|[16]1|[27]2|33|[48]4|[09]5|[15]6|67|[27]8|[38]9)|([02468][26]|[13579][048])([48]0|[09]1|[15]2|63|[27]4|[38]5|[49]6|[05]7|[16]8|29)|([02468][37]|[13579][159])([27]0|[38]1|[49]2|[05]3|[16]4|25|[37]6|87|[049]8|[5]9))-W53-[1-7]
-\d{0,6}(([02468][048]|[13579][26])(0[28]|1[39]|24|3[06]|4[17]|5[28]|6[49]|75|8[06]|9[27])|([02468][159]|[13579][37])(0[49]|15|2[06]|3[27]|4[38]|54|6[05]|7[16]|8[28]|9[39])|([02468][26]|[13579][048])(0[51]|16|2[28]|3[39]|44|5[06]|6[17]|7[28]|8[49]|95)|([02468][37]|[13579][159])(0[17]|1[28]|2[49]|35|4[06]|5[27]|6[38]|74|8[05]|9[16]))-W53-[1-7]
PCRE (also Perl), 778 bytes
/^([+-]?\d*((([02468][048]|[13579][26]|\d\d(?!00))([02468][048]|[13579][26]))|\d{4}(?!-02-29|-366))-((?!02-3|(0[469]|11)-31|000)((0[1-9]|1[012])-(0[1-9]|[12]\d|30|31)|([012]\d\d|3([0-5]\d|6[0-6])))|(W(?!00)([0-4]\d|51|52)-[1-7]))|((\+?\d*([02468][048]|[13579][26])|-\d*([02468][159]|[13579][37]))(04|09|15|20|26|32|37|43|48|54|60|65|71|76|82|88|93|99)|(\+?\d*([02468][159]|[13579][37])|-\d*([02468][26]|[13579][048]))(05|11|16|22|28|33|39|44|50|56|61|67|72|78|84|89|95)|(\+?\d*([02468][26]|[13579][048])|-\d*([02468][37]|[13579][159]))(01|07|12|18|24|29|35|40|46|52|57|63|68|74|80|85|91|96)|\+?\d*(([02468][37]|[13579][159])(03|14|20|25|31|36|42|53|59|64|70|76|81|87|92|[049]8))|-\d*(([02468][048]|[13579][26])([059]2|08|13|19|24|30|36|41|47|58|64|69|75|80|86|97)))-W53-[1-7])$/
I have included the delimiters in the byte count to show that it doesn't rely on any flags.
It does not match valid dates within other strings, such as 1234-56-89 2016-02-29 9876-54-32
. The regex is shorter by not checking for a maximum of 10 digits for the year.
Extended with comments:
/^ # Start of pattern (no leading space)
(
# YEAR
# Optional sign and digits if more than 4 in year
[+-]?\d*(
# Years 00??, 04??, 08?? ... 92??, 96?? OR dd not followed by 00
# followed by 00, 04, 08 ... 92, 96 OR
(([02468][048]|[13579][26]|\d\d(?!00))([02468][048]|[13579][26])) |
# any year not followed by 29 February or day 366
\d{4}(?!-02-29|-366)
# dash
) -
# MONTH AND DAY, or DAY OF YEAR, or WEEK OF YEAR AND DAY if less than 53 weeks
(
# Not (30 or 31 February OR 31 April, June, September or December OR day 0)
(?!02-3|(0[469]|11)-31|000)
(
# Month dash day OR
(0[1-9]|1[012]) - (0[1-9]|[12]\d|30|31) |
# 001-299 OR 300-359 OR 360-366
([012]\d\d | 3([0-5]\d | 6[0-6]))
# OR
) |
(
# W 01-52 dash 1-7
W(?!00)([0-4]\d|51|52)-[1-7]
)
# OR
) |
# WEEK OF YEAR AND DAY only if week is 53
(
# Optional plus and extra year digits
\+?\d*(
# Years +0303 - +9998
([02468][37]|[13579][159])(03|14|20|25|31|36|42|53|59|64|70|76|81|87|92|[049]8)
) |
# Minus and extra year digits
-\d*(
# Years -0002 - -9697
([02468][048]|[13579][26])([059]2|08|13|19|24|30|36|41|47|58|64|69|75|80|86|97)
) |
# Years +0004 - +9699, -0104 - -9799
(\+?\d*([02468][048]|[13579][26])|-\d*([02468][159]|[13579][37]))
(04|09|15|20|26|32|37|43|48|54|60|65|71|76|82|88|93|99) |
# Years +0105 - +9795, -0205 - -9895
(\+?\d*([02468][159]|[13579][37])|-\d*([02468][26]|[13579][048]))
(05|11|16|22|28|33|39|44|50|56|61|67|72|78|84|89|95) |
# Years +0201 - +9896, -0301 - -9996
(\+?\d*([02468][26]|[13579][048])|-\d*([02468][37]|[13579][159]))
(01|07|12|18|24|29|35|40|46|52|57|63|68|74|80|85|91|96)
# dash W 53 dash 1-7
)-W53-[1-7]
# End of pattern (no trailing space)
)$/x