Contribute to Perl projects with this year’s 24 Pull Requests

November 30, 2017 Community, CPAN, Perl 5, Tools No comments

24pullrequests is an annual project that runs every December to encourage contributions to open source.  Projects are organized by technology and types of contributions that are needed.

There are only eleven Perl projects so far, so add yours to help improve visibility and maybe get some help.

Three projects that I’m involved with could use some help.

  • ack, the grep-like code search tool is working towards a beta release for version 3.  There are many documentation changes I’d like to make in 3.000, including a cookbook, and it would be great if I could get some docs written by someone with a fresh set of eyes.
  • Perl::Critic, the static code analyzer for Perl 5
  • vim-perl is all the syntax highlighting and other magic that happens in vim.

Leave a comment with links for other projects that need some love.

Avoid the vagueness of dies_ok() in Test::Exception

June 28, 2017 CPAN, Tools No comments , , ,

It’s good to check that your code handles error conditions correctly, but dies_ok() in Test::Exception is too blunt an instrument to do it.

Consider this code that checks that the func() subroutine dies if not passed an argument.

#!/var/perl/bin/perl

use warnings;
use strict;

use Test::More tests => 4;
use Test::Exception;

sub func {
    die 'Must pass arg' unless defined $_[0];
}

# Test for failures if the arg is not passed.
dies_ok(   sub { func() }, '#1: Dies without an argument' );
throws_ok( sub { func() }, qr/Must pass arg/, '#2: Throws without an argument' );
lives_ok(  sub { func(42) }, '#3: Lives with an argument' );

# Oops, we made a typo in our function name, but this dies_ok() still passes.
dies_ok(   sub { func_where_the_name_is_incorrect() }, '#4: Func dies without an argument' );

In case #4, the call to func_where_the_name_is_incorrect() indeed dies, but for the wrong reason. It dies because the function doesn’t exist. If we had used throws_ok instead of dies_ok like so:

throws_ok( sub { func_where_the_name_is_incorrect() }, qr/Must pass arg/, '#4: Func dies without an argument' );

then the test would have failed because the exception was incorrect:

#   Failed test '#4: Func dies without an argument'
#   at foo.t line 19.
# expecting: Regexp ((?^:Must pass arg))
# found: Undefined subroutine &main::func_where_the_name_is_incorrect called at foo.t line 19.

Why do I post this? I found an example of this in some code I was working with, where the test had been passing for the wrong reason for the past six years. Take the time to be specific in what you check for.

Dueling code wizardry is one of the things I love most about Perl

June 27, 2017 Community, CPAN 3 comments , , ,

At least week’s Perl Conference, Damian Conway talked about some new magical awesomeness he created, as he so frequently does. It’s Test::Expr, and it makes it easier to write tests:

# Write this ...                 ... instead of this.
ok $got eq $expected;            is        $got, $expected;
ok $got ne $unexpected;          isnt      $got, $unexpected;
ok $got == $expected;            is_deeply $got, $expected;
ok $got ~~ $expected;            unlike    $got, $pattern;
ok $got =~ $pattern;             like      $got, $pattern;
ok $got !~ $pattern;             unlike    $got, $pattern;
ok $obj->isa($classname);        is_ok     $got, $classname;
ok $obj->can($methodname);       can_ok    $obj, $methodname;

It also improves the diagnostics by showing the expression that failed.

#   Failed test '$got eq $expected'
#   at t/synopsis.t line 13.
#   because:
#          $got --> "1.0"
#     $expected --> 1

Chad Granum, the maintainer of much of Perl’s testing infrastructure took that last part as a challenge and overnight created his own magic in response: Test2::Plugin::SourceDiag.

use Test2::V0;
use Test2::Plugin::SourceDiag;

ok(0, "fail");

done_testing;

Produces the output:

not ok 1 - fail
Failure source code:
# ------------
# 4: ok(0, "fail");
# ------------
# Failed test 'fail'
# at test.pl line 4.

instead of:

not ok 1 - fail

#   Failed test 'fail'
#   at foo.t line 4.

This kind of dueling wizardry is one of the things that I love so much about Perl and its community.

Watch Chad’s lightning talk:

Improve your test logs with simple distro diagnostics

June 11, 2017 CPAN No comments ,

Automated module testing systems are becoming more and more common.  In addition to our long-serving CPAN Testers service, Perl authors can have their modules tested by Travis for Linux and Appveyor for Windows.  CPAN Testers tests each distribution uploaded to PAUSE, whereas Travis and Appveyor keep an eye on your GitHub account (or other services) and try testing after each push to the home repo.

Something that I’ve found helps out with diagnosing problems is by having a diagnostic dump of modules in the the first test.  I’ll have a test like t/00-modules.t, like this one from ack:


#!perl -T

use warnings;
use strict;
use Test::More tests => 1;

use App::Ack;   # For the VERSION
use File::Next;
use Test::Harness;
use Getopt::Long;
use Pod::Usage;
use File::Spec;

my @modules = qw(
    File::Next
    File::Spec
    Getopt::Long
    Pod::Usage
    Test::Harness
    Test::More
);

pass( 'All external modules loaded' );

diag( "Testing ack version $App::Ack::VERSION under Perl $], $^X" );
for my $module ( @modules ) {
    no strict 'refs';
    my $ver = ${$module . '::VERSION'};
    diag( "Using $module $ver" );
}

Then, when the user or automated tester runs make test, the first thing out tells me exactly what we’re working with.

[19:15:52] t/00-load.t .................. 1/23 # Testing ack version 2.999_01 under Perl 5.026000, /home/andy/perl5/perlbrew/perls/perl-5.26.0/bin/perl
# Using File::Next 1.16
# Using File::Spec 3.67
# Using Getopt::Long 2.49
# Using Pod::Usage 1.69
# Using Test::Harness 3.38
# Using Test::More 1.302073

This is also very useful for when users have test failures and submit their logs to a bug tracker. It’s especially important in this case to show the Getopt::Long version because ack has had problems in the past with some changes in API in the past.

Perl::Critic 1.128 fixes bugs and works with Perl 5.26

June 11, 2017 CPAN No comments

I’ve just released a new official release of Perl::Critic, the static code analysis tool for Perl. It uses the new version of the PPI Perl-parsing module, and it works with the new Perl 5.26, which does not include . in @INC by default.

If you’ve never used Perl::Critic to analyze your code base for potential bugs and stylistic improvements, mostly based on Damian Conway’s Perl Best Practices, try it out.

Here’s the changelog:

    [Bug Fixes]
    * PPI misparsing a module caused an incorrect "Must end with a
      recognizable true value."  This is fixed by upgrading to PPI
      1.224. (GH #696, GH #607)
    * A test would fail under the upcoming Perl 5.26 that omits the current
      directory from @INC.  Thanks, Kent Fredric.
    * Fixed an invalid test in the RequireBarewordsIncludes test.  Thanks,
      Christian Walde. (GH #751)
    * If an element contained blank lines then the source "%r" displayed
      for a violation was wrong. Thanks, Sawyer X. (GH #702, #734)

    [Dependencies]
    Perl::Critic now requires PPI 1.224.  PPI is the underlying Perl parser
    on which Perl::Critic is built, and 1.224 introduces many parsing fixes
    such as:
    * Fixes for dot-in-@INC.
    * Parse left side of => as bareword even if it looks like a keyword or op.
    * $::x now works.
    * Higher accuracy when deciding whether certain characters are operators or
      variable type casts (*&% etc.).
    * Subroutine attributes parsed correctly.

    [Performance Enhancements]
    * Sped up BuiltinFunctions::ProhibitUselessTopic ~7%.  Thanks, James
      Raspass. (GH #656)

    [Documentation]
    * Fixed incorrect explanation of capture variables in
      ProhibitCaptureWithoutTest.  Thanks, Felipe Gasper.
    * Fixed incorrect links. Thanks, Glenn Fowler.
    * Fixed incorrect example for returning a sorted list.  Thanks, @daviding58.
    * Fixed invalid POD.  Thanks, Jakub Wilk. (GH #735)
    * Updated docs on ProhibitYadaOperator.  Thanks, Stuart A Johnston. (GH #662)
    * Removed all the references to the old mailing list and code repository
      at tigris.org.  (GH #757)

Perl::Critic releases its first new developer release in 21 months

May 26, 2017 CPAN 2 comments

I’ve just released a new developer release of Perl::Critic, the static code analysis tool for Perl, as we work toward its first new release in 21 months. This version of Perl::Critic fixes a few bugs and relies on a new release of the underlying Perl parsing library PPI, which also has had its first new release in a while.

This version of Perl::Critic is also ready for the impending release of Perl 5.26, which will no longer include . in @INC by default.

I’ve been spending some time working through the issues in the GitHub project, cleaning up what I can and clarifying others.

If you’ve never used Perl::Critic to analyze your code base for potential bugs and stylistic improvements, mostly based on Damian Conway’s Perl Best Practices, try it out.

Here’s the changelog:

    [Bug Fixes]
    * PPI misparsing a module caused an incorrect "Must end with a
      recognizable true value."  This is fixed by upgrading to PPI
      1.224. (GH #696, GH #607)
    * A test would fail under the upcoming Perl 5.26 that omits the current
      directory from @INC.  Thanks, Kent Fredric.
    * Fixed an invalid test in the RequireBarewordsIncludes test.  Thanks,
      Christian Walde. (GH #751)
    * If an element contained blank lines then the source "%r" displayed
      for a violation was wrong. Thanks, Sawyer X. (GH #702, #734)

    [Dependencies]
    Perl::Critic now requires PPI 1.224.  PPI is the underlying Perl parser
    on which Perl::Critic is built, and 1.224 introduces many parsing fixes
    such as:
    * Fixes for dot-in-@INC.
    * Parse left side of => as bareword even if it looks like a keyword or op.
    * $::x now works.
    * Higher accuracy when deciding whether certain characters are operators or
      variable type casts (*&% etc.).
    * Subroutine attributes parsed correctly.

    [Performance Enhancements]
    * Sped up BuiltinFunctions::ProhibitUselessTopic ~7%.  Thanks, James
      Raspass. (GH #656)

    [Documentation]
    * Fixed incorrect explanation of capture variables in
      ProhibitCaptureWithoutTest.  Thanks, Felipe Gasper.
    * Fixed incorrect links. Thanks, Glenn Fowler.
    * Fixed incorrect example for returning a sorted list.  Thanks, @daviding58.
    * Fixed invalid POD.  Thanks, Jakub Wilk. (GH #735)
    * Updated docs on ProhibitYadaOperator.  Thanks, Stuart A Johnston. (GH #662)
    * Removed all the references to the old mailing list and code repository
      at tigris.org.  (GH #757)

Speed up DBI reads by binding variables directly

April 27, 2017 CPAN 1 comment ,

If you’re using DBI directly for your database access, not through some ORM, then fetchrow_hashref is probably the handiest way to fetch result rows. However, if you’re working on lots of rows and time is critical, know that it is also be the slowest way to do so.

Here’s a benchmark that shows that binding columns with bind_column takes half the runtime of fetchrow_hashref.


use strict;
use warnings;
use 5.010;

use Benchmark ':hireswallclock';

our $ITERATIONS = 1_000_000;
our $sth;

sub prep_handle {
    my $sql = <<"EOF";
    SELECT title, author, isbn
    FROM title
    WHERE ROWNUM < $ITERATIONS
EOF
    return sqldo_handle( $sql );  # Calls DBI->prepare
}

sub hashref {
    while ( my $row = $sth->fetchrow_hashref ) {
        my $title  = $row->{title};
        my $author = $row->{author};
        my $isbn   = $row->{isbn};
    }
    $sth->finish;
}

sub array {
    while ( my @row = $sth->fetchrow_array ) {
        my ($title,$author,$isbn) = @row;
    }
    $sth->finish;
}

sub arrayref {
    while ( my $row = $sth->fetchrow_arrayref ) {
        my $title  = $row->[0];
        my $author = $row->[1];
        my $isbn   = $row->[2];
    }
    $sth->finish;
}

sub direct_bind {
    $sth->bind_columns( \my $title, \my $author, \my $isbn );
    while ( my $row = $sth->fetch ) {
        # No need to copy.
    }
    $sth->finish;
}

say "Running $ITERATIONS iterations";
for my $func ( qw( hashref array arrayref direct_bind ) ) {
    $sth = prep_handle();
    my $t = timeit( 1, "$func()" );
    printf( "%-11s took %s\n", $func, timestr($t) );
}

Which gives these results

$ ./dbi-bind-bench
Running 1000000 iterations
hashref     took 7.37747 wallclock secs ( 4.98 usr +  0.25 sys =  5.23 CPU) @  0.19/s (n=1)
array       took 4.01768 wallclock secs ( 1.68 usr +  0.19 sys =  1.87 CPU) @  0.53/s (n=1)
arrayref    took 3.86365 wallclock secs ( 1.60 usr +  0.16 sys =  1.76 CPU) @  0.57/s (n=1)
direct_bind took 3.36962 wallclock secs ( 1.13 usr +  0.15 sys =  1.28 CPU) @  0.78/s (n=1)

When speed is key, bind your output variables directly.

ack 2.18 has been released; ack 3 starting development

March 24, 2017 CPAN, Tools No comments

I’ve just uploaded ack 2.18 to CPAN and to https://beyondgrep.com/.

ack 2.18 will probably be the final release in the ack 2.x series. I’m going to be starting work on ack 3.000 in earnest.  Still, if you discover problems with ack 2, please report them to https://github.com/petdance/ack2/issues

If you’re interested in ack 3 development, please sign up for the ack-dev mailing list and/or join the ack Slack.  See https://beyondgrep.com/community/ for details.

2.18    Fri Mar 24 14:53:19 CDT 2017
====================================
[FIXES]
ack no longer throws an undefined variable warning if it's called
from a directory that doesn't exist. (GH #634)

--context=0 (and its short counterpart -C 0) did not set to context
of 0.  This means that a command-line --context=0 couldn't override
a --context=5 in your ackrc.  Thanks, Ed Avis.  (GH #595)

t/ack-s.t would fail in non-English locales.  Thanks, Olivier Mengué.
(GH #485, GH #515)

[ENHANCEMENTS]
--after-context and --before-context (and their short counterparts
-A and -B) no longer require a value to be passed.  If no value is
set, they default to 2. (GH #351)

Added .xhtml to the --html filetype.  Added .wsdl to the --xml filetype.
Thanks, H.Merijn Brand.  (GH #456)

[DOCUMENTATION]
Explain that filetypes must be two characters or longer. (GH #389)

Updated incorrect docs about how ack works.  Thanks, Gerhard Poul.
(GH #543)

[INTERNALS]
Removed the abstraction of App::Ack::Resource and its subclass
App::Ack::Resource::Basic.  We are abandoning the idea that we'll have
plugins.

Removed dependency on File::Glob which isn't used.

ack 2.16 has been released

March 10, 2017 CPAN, Tools No comments

ack 2.16 has been released.  ack is a grep-like tool optimized for searching source code.  It’s available at https://beyondgrep.com, or via CPAN using App::Ack.

Here are the changes between 2.16 and 2.14.

2.16    Fri Mar 10 13:32:39 CST 2017
====================================
[CONFUSING BEHAVIOR & UPCOMING CHANGES]
The -w has a confusing behavior that it's had since back to ack 1.x
that will be changing in the future.  It's not changing in this
version, but this is a heads-up that it's coming.

ack -w is "match a whole word", and ack does this by putting turning
your PATTERN into \bPATTERN\b.  So "ack -w foo" effectively becomes
"ack \bfoo\b".  Handy.

The problem is that ack doesn't put a \b before PATTERN if it begins
with a non-word character, and won't put a \b after PATTERN if it
ends with a non-word character.

The problem is that if you're searching for "fool" or "foot", but
only as a word, and you do "ack -w foo[lt]" or "ack -w (fool|foot)",
you'll get matches for "football and foolish" which certainly should
not match if you're using -w.


[ENHANCEMENTS]
Include .cljs, .cljc and .edn files with the --clojure filetype.  Thanks,
Austin Chamberlin.

Added .xsd to the --xml filetype.  Thanks, Nick Morrott.

Added support for Swift language.  Thanks, Nikolaj Schumacher. (GH #512)

The MSYS2 project is now seen as Windows.  Thanks, Ray Donnelly. (GH #450)

Expand the definition of OCaml files.  Thanks, Marek Kubica. (GH #511)

Add support for Groovy Server Pages.  Thanks, Ethan Mallove. (GH #469)

The JSP filetype (--jsp) now recognizes .jspf files.  Thanks, Sebastien
Feugere.  (GH #586)

Added --hpp option for C++ header files.  Thankis, Steffen Jaeckel.

ack now supports --ignore-dir=match:....  Thanks, Ailin Nemui! (GitHub ticket #42)

ack also supports --ignore-dir=ext:..., and --noignore-dir supports match/ext as well


[FIXES]
Reverted an optimization to make \s work properly again. (GH #572,
GH #571, GH #562, GH #491, GH #498)

The -l and -c flags would sometimes return inaccurate results due to
a bug introduced in 2.14.  Thanks to Elliot Shank for the report! (GH #491)

Behavior when using newlines in a search was inconsistent.  Thanks to Yves Chevallier
for the report! (GH #522)

Don't ignore directories that are specified as command line targets (GH #524)

Fix a bug where a regular expression that matches the empty string could cause ack
to go into an infinite loop (GH #542)


[INTERNALS]
Add minimal requirement of Getopt::Long 2.38, not 2.35, for GetOptionsFromString.

Added test to test --output. Thanks, Varadinsky! (GH #587, GH #590)

Added test to make sure subdirs of target subdirs are ignored if
--ignore-dir applies to them.  Thanks, Pete Houston. (GH #570)

Many optimizations and code cleanups.  Thanks, Stephan Hohe.

Fixed an out-of-date FAQ entry.  Thanks, Jakub Wilk.  (GH #580)

[DOCUMENTATION]
Expanded the explanation of how the -w flag works.  Thanks, Ed Avis.
(GH #585)

Higher-level list utility functions with List::UtilsBy

February 13, 2017 CPAN, Perl 5 2 comments , , , , ,

I’m in love with List::UtilsBy. It’s one of those “Why didn’t someone do this earlier?” modules (or maybe it’s “Why didn’t I know about it earlier?”). It replicates much of the functionality of List::Util and lets them operate on arbitrary blocks.

Convenience

I’ve always been annoyed at having to repeat the field name in a sort sub:

my @sorted = sort { $a->name cmp $b->name } @users;

Now it’s just this:

my @sorted = sort_by { $_->name } @users;

I can’t imagine how many times I’ve written something like this to build a hash of the counts of something based on a list:

# Tally up a list of first letters of everyone's name.
my %n;
for my $i ( @users ) {
    ++$n{ substr($i->name,0,1) };
}

Now, with List::UtilsBy, I can do this:

my %n = count_by { substr($_->name,0,1) } @users;

How about getting the user with the highest salary?

my $highest_paid;
my $max_salary = 0;
for my $user ( @users ) {
    if ( (my $val = $user->salary) > $max_salary ) {
        $highest_paid = $user;
        $max_salary = $val;
    }
}
say $highest_paid->name, ' is the highest paid.';

Or I can just do this:

my $highest_paid = max_by { $_->salary } @users;

Performance

For most of the functions I’ve tried, List::UtilsBy is slower than the hand-written loop. However, for sorting sort_by has been faster because it evaluates the key exactly once for each element of the loop. It was even faster than rolling my own Schwartzian transform. For complex key methods, the time savings will be dramatic.

I used Benchmark to run a test of 1000 sorts of 5000 input records.

timethese( 1000, {
    # Key comes from a hash lookup.
    lookup_raw_____ => sub { @x = sort { $a->{name} cmp $b->{name} } @input },
    lookup_schwartz => sub { @x = map { $_->[0] } sort { $a->[1] cmp $b->[1] } map { [$_, $_->{name}] } @input },
    lookup_utilsby_ => sub { @x = sort_by { $_->{name} } @input },

    # Key comes from a method call.
    method_raw_____ => sub { @x = sort { $a->name cmp $b->name } @input },
    method_schwartz => sub { @x = map { $_->[0] } sort { $a->[1] cmp $b->[1] } map { [$_, $_->name] } @input },
    method_utilsby_ => sub { @x = sort_by { $_->name } @input },
} );

Giving these results:

# Key comes from a hash lookup.
lookup_raw_____: 14 wallclock secs @ 72.52/s
lookup_schwartz: 19 wallclock secs @ 53.28/s
lookup_utilsby_: 15 wallclock secs @ 64.47/s

# Key comes from a method call.
method_raw_____: 54 wallclock secs @ 18.64/s
method_schwartz: 22 wallclock secs @ 45.64/s
method_utilsby_: 17 wallclock secs @ 59.59/s

In most cases, I’m more than happy to burn a few milliseconds in the name of simplicity and reduced amount of code.

Thanks to Paul Evans for putting this together. And thanks to Dave Rolsky for putting out List::AllUtils which brings together List::Util, List::MoreUtils and List::UtilsBy under one handy umbrella.