Recently in Regexes Category
On the perl-qa list tonight, we were discussing how best to find all the modules used in a source tree. To do the job right, you'd have to run the code and then look at the %INC:: hash, which shows the paths of all modules loaded. The low-tech and usually-good-enough solutions we came up with use ack:
$ ack -h '^use\s+(\w+(?:::\w+)*).*' --output=\$1 | sort -u
Thanks to Andy Armstrong for coming up with a better regex than mine, which assumed that the use statement would necessarily end with a semicolon.
My baby, ack, broke under Perl 5.10.0, because of a fix in regex behavior that I had been using unknowingly. See, I had always used my regex objects like this:
my $re = qr/^blah blah/; if ( $string =~ /$re/sm )...
when I should have been using it like this:
my $re = qr/^blah blah/sm; if ( $string =~ /$re/ )...
The bug in 5.8.x is that the /$re/sm would incorrectly apply the /sm modifiers to $re. This made the code happen to work, but for the wrong reason. What was especially tricky about finding my bug was that in 5.10.0, the call to /$re/sm ignores the /sm, but doesn't tell you that.
After some back and forth on p5p, a patch was submitted that gave the warning about the ignored /sm flags, but alas, Perl 5.10 was already out. It wouldn't have been so bad if it hadn't been the day AFTER it was released.
So, lesson learned: Test your code against new release candidates of Perl, both for your code's sake, AND for Perl's sake.
And y'know, now that I think of it, this is probably a great policy for Perl::Critic just waiting to happen. I wonder how many other people are doing their regexes the wrong way, too.