Regex from the dark lagoon

As a software developer or system admin have you ever encountered regular expressions that are just a bit too hard to understand? Kind of frustrating, right? As a rule, regular expressions are often relatively easy to write, but pretty hard to read even if you know what they are supposed to do. Then there is this bugger:


This is the most complex regex I’ve ever had need to write and I just had to share. Can you guess what it might do? 😉

I’ll pass on the suspense: it’s a Perl-compatible regular expression I wrote as part of a PHP library back in 2005 to parse ISO-8601 dates into their year, month, day, hour, minute, second, timezone components. I actually find the line above too dense to understand even knowing what it is supposed to do. When trying to write it I quickly realized that I had to use the ‘x’ modifier to allow me to space it over multiple lines and insert comments. Here is the more legible version:

^                                           # Start of the line

    (?:                                     # The date component
        ([0-9]{4})                          # Four-digit year

        [\-\/:]?                            # Optional Hyphen, slash, or colon delimiter

        (?:                                 # Two-digit month
            (?:  0[1-9])
            (?:  1[0-2])

            [\-\/:]?                        # Optional Hyphen, slash, or colon delimiter

            (?:                                 # Two-digit day
                (?:  0[1-9])
                (?:  (?: 1|2)[0-9])
                (?:  3[0-1])

                [\sT]?                                  # Optional delimiter

                (?:                                     # The time component

                    (                                   # Two-digit hour
                        (?:  [0-1][0-9])
                        (?: 2[0-4])

                        :?                                  # Optional Colon

                        ([0-5][0-9])?                       # Two-digit minute

                            :?                                  # Optional Colon

                            (                                   # Two-digit second 
                                (?: \.[0-9]+)?                      # followed by an optional decimal.

                            (                                   # Offset component

                                Z                               # Zero offset (UTC)
                                |                               # OR
                                (?:                             # Offset from UTC
                                    ([+\-])                     # Sign of the offset

                                    (                           # Two-digit offset hour
                                        (?:  [0-1][0-9])
                                        (?:  2[0-4])

                                    :?                          # Optional Colon

                                    ([0-5][0-9])?               # Two-digit offset minute


What’s the craziest regex you’ve come across?

Leave a Reply

Your email address will not be published. Required fields are marked *