As a software developer or system admin have you ever encountered regular expressions that are just a bit too hard to understand? Kind of frustrating, right? As a rule, regular expressions are often relatively easy to write, but pretty hard to read even if you know what they are supposed to do. Then there is this bugger:
/^(?:([0-9]{4})[\-\/:]?(?:((?:0[1-9])|(?:1[0-2]))[\-\/:]?(?:((?:0[1-9])|(?:(?:1|2)[0-9])
|(?:3[0-1]))[\sT]?(?:((?:[0-1][0-9])|(?:2[0-4]))(?::?([0-5][0-9])?(?::?([0-5][0-9](?:\.[0-9]+)?)?
(Z|(?:([+\-])((?:[0-1][0-9])|(?:2[0-4])):?([0-5][0-9])?))?)?)?)?)?)?)?$/x
This is the most complex regex I’ve ever had need to write and I just had to share. Can you guess what it might do? 😉
I’ll pass on the suspense: it’s a Perl-compatible regular expression I wrote as part of a PHP library back in 2005 to parse ISO-8601 dates into their year, month, day, hour, minute, second, timezone components. I actually find the line above too dense to understand even knowing what it is supposed to do. When trying to write it I quickly realized that I had to use the ‘x’ modifier to allow me to space it over multiple lines and insert comments. Here is the more legible version:
/
^ # Start of the line
#-----------------------------------------------------------------------------
(?: # The date component
([0-9]{4}) # Four-digit year
[\-\/:]? # Optional Hyphen, slash, or colon delimiter
(?: # Two-digit month
(
(?: 0[1-9])
|
(?: 1[0-2])
)
[\-\/:]? # Optional Hyphen, slash, or colon delimiter
(?: # Two-digit day
(
(?: 0[1-9])
|
(?: (?: 1|2)[0-9])
|
(?: 3[0-1])
)
[\sT]? # Optional delimiter
#-----------------------------------------------------------------------------
(?: # The time component
( # Two-digit hour
(?: [0-1][0-9])
|
(?: 2[0-4])
)
(?:
:? # Optional Colon
([0-5][0-9])? # Two-digit minute
(?:
:? # Optional Colon
( # Two-digit second
[0-5][0-9]
(?: \.[0-9]+)? # followed by an optional decimal.
)?
#-----------------------------------------------------------------------------
( # Offset component
Z # Zero offset (UTC)
| # OR
(?: # Offset from UTC
([+\-]) # Sign of the offset
( # Two-digit offset hour
(?: [0-1][0-9])
|
(?: 2[0-4])
)
:? # Optional Colon
([0-5][0-9])? # Two-digit offset minute
)
)?
)?
)?
)?
)?
)?
)?
$
/x
What’s the craziest regex you’ve come across?

