Regex from the dark lagoon

November 13th, 2013

Filed under: Computers and Technology , Work/Professional

Tags: , , ,

As a software developer or system admin have you ever encountered regular expressions that are just a bit too hard to understand? Kind of frustrating, right? As a rule, regular expressions are often relatively easy to write, but pretty hard to read even if you know what they are supposed to do. Then there is this bugger:

/^(?:([0-9]{4})[\-\/:]?(?:((?:0[1-9])|(?:1[0-2]))[\-\/:]?(?:((?:0[1-9])|(?:(?:1|2)[0-9])
|(?:3[0-1]))[\sT]?(?:((?:[0-1][0-9])|(?:2[0-4]))(?::?([0-5][0-9])?(?::?([0-5][0-9](?:\.[0-9]+)?)?
(Z|(?:([+\-])((?:[0-1][0-9])|(?:2[0-4])):?([0-5][0-9])?))?)?)?)?)?)?)?$/x

This is the most complex regex I’ve ever had need to write and I just had to share. Can you guess what it might do? 😉

I’ll pass on the suspense: it’s a Perl-compatible regular expression I wrote as part of a PHP library back in 2005 to parse ISO-8601 dates into their year, month, day, hour, minute, second, timezone components. I actually find the line above too dense to understand even knowing what it is supposed to do. When trying to write it I quickly realized that I had to use the ‘x’ modifier to allow me to space it over multiple lines and insert comments. Here is the more legible version:

/
^                                           # Start of the line

#-----------------------------------------------------------------------------
    (?:                                     # The date component
        ([0-9]{4})                          # Four-digit year

        [\-\/:]?                            # Optional Hyphen, slash, or colon delimiter

        (?:                                 # Two-digit month
            (
            (?:  0[1-9])
            |
            (?:  1[0-2])
            )

            [\-\/:]?                        # Optional Hyphen, slash, or colon delimiter

            (?:                                 # Two-digit day
                (
                (?:  0[1-9])
                |
                (?:  (?: 1|2)[0-9])
                |
                (?:  3[0-1])
                )

                [\sT]?                                  # Optional delimiter

            #-----------------------------------------------------------------------------      
                (?:                                     # The time component

                    (                                   # Two-digit hour
                        (?:  [0-1][0-9])
                        |
                        (?: 2[0-4])
                    )

                    (?:
                        :?                                  # Optional Colon

                        ([0-5][0-9])?                       # Two-digit minute

                        (?:
                            :?                                  # Optional Colon

                            (                                   # Two-digit second 
                                [0-5][0-9]
                                (?: \.[0-9]+)?                      # followed by an optional decimal.
                            )?

                    #-----------------------------------------------------------------------------
                            (                                   # Offset component

                                Z                               # Zero offset (UTC)
                                |                               # OR
                                (?:                             # Offset from UTC
                                    ([+\-])                     # Sign of the offset

                                    (                           # Two-digit offset hour
                                        (?:  [0-1][0-9])
                                        |
                                        (?:  2[0-4])
                                    )           

                                    :?                          # Optional Colon

                                    ([0-5][0-9])?               # Two-digit offset minute
                                )
                            )?
                        )?
                    )?
                )?
            )?
        )?
    )?

$
/x

What’s the craziest regex you’ve come across?

Trackback URI | Comments RSS

Leave a Reply