As a software developer or system admin have you ever encountered regular expressions that are just a bit too hard to understand? Kind of frustrating, right? As a rule, regular expressions are often relatively easy to write, but pretty hard to read even if you know what they are supposed to do. Then there is this bugger:
/^(?:([0-9]{4})[\-\/:]?(?:((?:0[1-9])|(?:1[0-2]))[\-\/:]?(?:((?:0[1-9])|(?:(?:1|2)[0-9]) |(?:3[0-1]))[\sT]?(?:((?:[0-1][0-9])|(?:2[0-4]))(?::?([0-5][0-9])?(?::?([0-5][0-9](?:\.[0-9]+)?)? (Z|(?:([+\-])((?:[0-1][0-9])|(?:2[0-4])):?([0-5][0-9])?))?)?)?)?)?)?)?$/x
This is the most complex regex I’ve ever had need to write and I just had to share. Can you guess what it might do? 😉
I’ll pass on the suspense: it’s a Perl-compatible regular expression I wrote as part of a PHP library back in 2005 to parse ISO-8601 dates into their year, month, day, hour, minute, second, timezone components. I actually find the line above too dense to understand even knowing what it is supposed to do. When trying to write it I quickly realized that I had to use the ‘x’ modifier to allow me to space it over multiple lines and insert comments. Here is the more legible version:
/ ^ # Start of the line #----------------------------------------------------------------------------- (?: # The date component ([0-9]{4}) # Four-digit year [\-\/:]? # Optional Hyphen, slash, or colon delimiter (?: # Two-digit month ( (?: 0[1-9]) | (?: 1[0-2]) ) [\-\/:]? # Optional Hyphen, slash, or colon delimiter (?: # Two-digit day ( (?: 0[1-9]) | (?: (?: 1|2)[0-9]) | (?: 3[0-1]) ) [\sT]? # Optional delimiter #----------------------------------------------------------------------------- (?: # The time component ( # Two-digit hour (?: [0-1][0-9]) | (?: 2[0-4]) ) (?: :? # Optional Colon ([0-5][0-9])? # Two-digit minute (?: :? # Optional Colon ( # Two-digit second [0-5][0-9] (?: \.[0-9]+)? # followed by an optional decimal. )? #----------------------------------------------------------------------------- ( # Offset component Z # Zero offset (UTC) | # OR (?: # Offset from UTC ([+\-]) # Sign of the offset ( # Two-digit offset hour (?: [0-1][0-9]) | (?: 2[0-4]) ) :? # Optional Colon ([0-5][0-9])? # Two-digit offset minute ) )? )? )? )? )? )? )? $ /x
What’s the craziest regex you’ve come across?