CPAN

Use Getopt::Long even if you don’t think you need to

August 3, 2018 CPAN, Perl 5 No comments

Note: This is a post from 2008 that I dug up that still applies today.

There’s a thread over on perlmonks talks about Tom Christiansen’s assertion that you should use it, by default, even when you only have one command-line argument to parse:

What seems to happen is that at first we just want to add–oh say for example JUST ONE, SINGLE LITTLE -v flag. Well, that’s so easy enough to hand-hack, that of course we do so… But just like any other piece of software, these things all seem to have a way of overgrowing their original expectations… Getopt::Long is just *wonderful*, up–I believe–to any job you can come up with for it. Too often its absence means that I’ve in the long run made more work for myself–or others–by not having used it originally. [Emphasis mine — Andy]

I can’t agree more. I don’t care if you use Getopt::Long or Getopt::Declare or Getopt::Lucid or any of the other variants out there. You know know know that you’re going to add more arguments down the road, so why not start out right?

Yes, it can be tricky to get through all of its magic if you’re unfamiliar with it, but it’s pretty obvious when you see enough examples. Take a look at prove or ack for examples. mech-dump is pretty decent as an example as well:

GetOptions(
    'user=s'        => \$user,
    'password=s'    => \$pass,
    forms           => sub { push( @actions, \&dump_forms ); },
    links           => sub { push( @actions, \&dump_links ); },
    images          => sub { push( @actions, \&dump_images ); },
    all             => sub { push( @actions, \&dump_forms, \&dump_links, \&dump_images ); },
    absolute        => \$absolute,
    'agent=s'       => \$agent,
    'agent-alias=s' => \$agent_alias,
    help            => sub { pod2usage(1); },
) or pod2usage(2);

Where the value in the hashref is a variable reference, the value gets stored in there. Where it’s a sub, that sub gets executed with the arguments passed in. That’s the basics, and you don’t have to worry about anything else. Your user can pass –abs instead of –absolute if it’s unambiguous. You can have mandatory flags, as in agent=s, where –agent must take a string. On and on, it’s probably got the functionality you need.

One crucial reminder: You must check the return code of GetOptions. Otherwise, your program will carry on. If someone gives your program an invalid argument on the command-line, then you know that the program cannot possibly be running in the way the user intended. Your program must stop immediately.

Not checking the return of GetOptions is as bad as not checking the return of open. In fact, I think I smell a new Perl::Critic policy….

ack 2.24 is released, speeding up common cases

June 21, 2018 CPAN, Tools No comments ,

I’ve just uploaded a new version of ack, version 2.24, to the CPAN and posted it to beyondgrep.com.

This version of ack introduces an optimization to not search most files line-by-line until after ack has read the whole file and determined that the pattern matches in the file at all. This speeds things up quite a bit. Here are some timings in seconds for acking against the Drupal codebase. It compares ack 2.22, the just-released 2.24 and the latest beta of ack version 3:

                                    |   2.22 |   2.24 | 3 beta
--------------------------------------------------------------
          ack zqj /home/andy/drupal |   1.93 |   1.53 |   1.46
ack zqj-not-there /home/andy/drupal |   1.85 |   1.50 |   1.45
          ack foo /home/andy/drupal |   1.99 |   1.66 |   1.62
       ack foo -w /home/andy/drupal |   1.94 |   1.63 |   1.57
  ack foo\w+ -C10 /home/andy/drupal |   4.54 |   2.13 |   1.97
ack (set|get)_\w+ /home/andy/drupal |   2.01 |   1.79 |   1.79

Of course ack doesn’t have the raw speed of a tool like ripgrep, but it’s got a different feature set. See this feature comparison chart of greplike tools for details.

Contribute to Perl projects with this year’s 24 Pull Requests

November 30, 2017 Community, CPAN, Perl 5, Tools 1 comment

24pullrequests is an annual project that runs every December to encourage contributions to open source.  Projects are organized by technology and types of contributions that are needed.

There are only eleven Perl projects so far, so add yours to help improve visibility and maybe get some help.

Three projects that I’m involved with could use some help.

  • ack, the grep-like code search tool is working towards a beta release for version 3.  There are many documentation changes I’d like to make in 3.000, including a cookbook, and it would be great if I could get some docs written by someone with a fresh set of eyes.
  • Perl::Critic, the static code analyzer for Perl 5
  • vim-perl is all the syntax highlighting and other magic that happens in vim.

Leave a comment with links for other projects that need some love.

Avoid the vagueness of dies_ok() in Test::Exception

June 28, 2017 CPAN, Tools No comments , , ,

It’s good to check that your code handles error conditions correctly, but dies_ok() in Test::Exception is too blunt an instrument to do it.

Consider this code that checks that the func() subroutine dies if not passed an argument.

#!/var/perl/bin/perl

use warnings;
use strict;

use Test::More tests => 4;
use Test::Exception;

sub func {
    die 'Must pass arg' unless defined $_[0];
}

# Test for failures if the arg is not passed.
dies_ok(   sub { func() }, '#1: Dies without an argument' );
throws_ok( sub { func() }, qr/Must pass arg/, '#2: Throws without an argument' );
lives_ok(  sub { func(42) }, '#3: Lives with an argument' );

# Oops, we made a typo in our function name, but this dies_ok() still passes.
dies_ok(   sub { func_where_the_name_is_incorrect() }, '#4: Func dies without an argument' );

In case #4, the call to func_where_the_name_is_incorrect() indeed dies, but for the wrong reason. It dies because the function doesn’t exist. If we had used throws_ok instead of dies_ok like so:

throws_ok( sub { func_where_the_name_is_incorrect() }, qr/Must pass arg/, '#4: Func dies without an argument' );

then the test would have failed because the exception was incorrect:

#   Failed test '#4: Func dies without an argument'
#   at foo.t line 19.
# expecting: Regexp ((?^:Must pass arg))
# found: Undefined subroutine &main::func_where_the_name_is_incorrect called at foo.t line 19.

Why do I post this? I found an example of this in some code I was working with, where the test had been passing for the wrong reason for the past six years. Take the time to be specific in what you check for.

Dueling code wizardry is one of the things I love most about Perl

June 27, 2017 Community, CPAN 3 comments , , ,

At least week’s Perl Conference, Damian Conway talked about some new magical awesomeness he created, as he so frequently does. It’s Test::Expr, and it makes it easier to write tests:

# Write this ...                 ... instead of this.
ok $got eq $expected;            is        $got, $expected;
ok $got ne $unexpected;          isnt      $got, $unexpected;
ok $got == $expected;            is_deeply $got, $expected;
ok $got ~~ $expected;            unlike    $got, $pattern;
ok $got =~ $pattern;             like      $got, $pattern;
ok $got !~ $pattern;             unlike    $got, $pattern;
ok $obj->isa($classname);        is_ok     $got, $classname;
ok $obj->can($methodname);       can_ok    $obj, $methodname;

It also improves the diagnostics by showing the expression that failed.

#   Failed test '$got eq $expected'
#   at t/synopsis.t line 13.
#   because:
#          $got --> "1.0"
#     $expected --> 1

Chad Granum, the maintainer of much of Perl’s testing infrastructure took that last part as a challenge and overnight created his own magic in response: Test2::Plugin::SourceDiag.

use Test2::V0;
use Test2::Plugin::SourceDiag;

ok(0, "fail");

done_testing;

Produces the output:

not ok 1 - fail
Failure source code:
# ------------
# 4: ok(0, "fail");
# ------------
# Failed test 'fail'
# at test.pl line 4.

instead of:

not ok 1 - fail

#   Failed test 'fail'
#   at foo.t line 4.

This kind of dueling wizardry is one of the things that I love so much about Perl and its community.

Watch Chad’s lightning talk:

Improve your test logs with simple distro diagnostics

June 11, 2017 CPAN No comments ,

Automated module testing systems are becoming more and more common.  In addition to our long-serving CPAN Testers service, Perl authors can have their modules tested by Travis for Linux and Appveyor for Windows.  CPAN Testers tests each distribution uploaded to PAUSE, whereas Travis and Appveyor keep an eye on your GitHub account (or other services) and try testing after each push to the home repo.

Something that I’ve found helps out with diagnosing problems is by having a diagnostic dump of modules in the the first test.  I’ll have a test like t/00-modules.t, like this one from ack:


#!perl -T

use warnings;
use strict;
use Test::More tests => 1;

use App::Ack;   # For the VERSION
use File::Next;
use Test::Harness;
use Getopt::Long;
use Pod::Usage;
use File::Spec;

my @modules = qw(
    File::Next
    File::Spec
    Getopt::Long
    Pod::Usage
    Test::Harness
    Test::More
);

pass( 'All external modules loaded' );

diag( "Testing ack version $App::Ack::VERSION under Perl $], $^X" );
for my $module ( @modules ) {
    no strict 'refs';
    my $ver = ${$module . '::VERSION'};
    diag( "Using $module $ver" );
}

Then, when the user or automated tester runs make test, the first thing out tells me exactly what we’re working with.

[19:15:52] t/00-load.t .................. 1/23 # Testing ack version 2.999_01 under Perl 5.026000, /home/andy/perl5/perlbrew/perls/perl-5.26.0/bin/perl
# Using File::Next 1.16
# Using File::Spec 3.67
# Using Getopt::Long 2.49
# Using Pod::Usage 1.69
# Using Test::Harness 3.38
# Using Test::More 1.302073

This is also very useful for when users have test failures and submit their logs to a bug tracker. It’s especially important in this case to show the Getopt::Long version because ack has had problems in the past with some changes in API in the past.

Perl::Critic 1.128 fixes bugs and works with Perl 5.26

June 11, 2017 CPAN No comments

I’ve just released a new official release of Perl::Critic, the static code analysis tool for Perl. It uses the new version of the PPI Perl-parsing module, and it works with the new Perl 5.26, which does not include . in @INC by default.

If you’ve never used Perl::Critic to analyze your code base for potential bugs and stylistic improvements, mostly based on Damian Conway’s Perl Best Practices, try it out.

Here’s the changelog:

    [Bug Fixes]
    * PPI misparsing a module caused an incorrect "Must end with a
      recognizable true value."  This is fixed by upgrading to PPI
      1.224. (GH #696, GH #607)
    * A test would fail under the upcoming Perl 5.26 that omits the current
      directory from @INC.  Thanks, Kent Fredric.
    * Fixed an invalid test in the RequireBarewordsIncludes test.  Thanks,
      Christian Walde. (GH #751)
    * If an element contained blank lines then the source "%r" displayed
      for a violation was wrong. Thanks, Sawyer X. (GH #702, #734)

    [Dependencies]
    Perl::Critic now requires PPI 1.224.  PPI is the underlying Perl parser
    on which Perl::Critic is built, and 1.224 introduces many parsing fixes
    such as:
    * Fixes for dot-in-@INC.
    * Parse left side of => as bareword even if it looks like a keyword or op.
    * $::x now works.
    * Higher accuracy when deciding whether certain characters are operators or
      variable type casts (*&% etc.).
    * Subroutine attributes parsed correctly.

    [Performance Enhancements]
    * Sped up BuiltinFunctions::ProhibitUselessTopic ~7%.  Thanks, James
      Raspass. (GH #656)

    [Documentation]
    * Fixed incorrect explanation of capture variables in
      ProhibitCaptureWithoutTest.  Thanks, Felipe Gasper.
    * Fixed incorrect links. Thanks, Glenn Fowler.
    * Fixed incorrect example for returning a sorted list.  Thanks, @daviding58.
    * Fixed invalid POD.  Thanks, Jakub Wilk. (GH #735)
    * Updated docs on ProhibitYadaOperator.  Thanks, Stuart A Johnston. (GH #662)
    * Removed all the references to the old mailing list and code repository
      at tigris.org.  (GH #757)

Perl::Critic releases its first new developer release in 21 months

May 26, 2017 CPAN 2 comments

I’ve just released a new developer release of Perl::Critic, the static code analysis tool for Perl, as we work toward its first new release in 21 months. This version of Perl::Critic fixes a few bugs and relies on a new release of the underlying Perl parsing library PPI, which also has had its first new release in a while.

This version of Perl::Critic is also ready for the impending release of Perl 5.26, which will no longer include . in @INC by default.

I’ve been spending some time working through the issues in the GitHub project, cleaning up what I can and clarifying others.

If you’ve never used Perl::Critic to analyze your code base for potential bugs and stylistic improvements, mostly based on Damian Conway’s Perl Best Practices, try it out.

Here’s the changelog:

    [Bug Fixes]
    * PPI misparsing a module caused an incorrect "Must end with a
      recognizable true value."  This is fixed by upgrading to PPI
      1.224. (GH #696, GH #607)
    * A test would fail under the upcoming Perl 5.26 that omits the current
      directory from @INC.  Thanks, Kent Fredric.
    * Fixed an invalid test in the RequireBarewordsIncludes test.  Thanks,
      Christian Walde. (GH #751)
    * If an element contained blank lines then the source "%r" displayed
      for a violation was wrong. Thanks, Sawyer X. (GH #702, #734)

    [Dependencies]
    Perl::Critic now requires PPI 1.224.  PPI is the underlying Perl parser
    on which Perl::Critic is built, and 1.224 introduces many parsing fixes
    such as:
    * Fixes for dot-in-@INC.
    * Parse left side of => as bareword even if it looks like a keyword or op.
    * $::x now works.
    * Higher accuracy when deciding whether certain characters are operators or
      variable type casts (*&% etc.).
    * Subroutine attributes parsed correctly.

    [Performance Enhancements]
    * Sped up BuiltinFunctions::ProhibitUselessTopic ~7%.  Thanks, James
      Raspass. (GH #656)

    [Documentation]
    * Fixed incorrect explanation of capture variables in
      ProhibitCaptureWithoutTest.  Thanks, Felipe Gasper.
    * Fixed incorrect links. Thanks, Glenn Fowler.
    * Fixed incorrect example for returning a sorted list.  Thanks, @daviding58.
    * Fixed invalid POD.  Thanks, Jakub Wilk. (GH #735)
    * Updated docs on ProhibitYadaOperator.  Thanks, Stuart A Johnston. (GH #662)
    * Removed all the references to the old mailing list and code repository
      at tigris.org.  (GH #757)

Speed up DBI reads by binding variables directly

April 27, 2017 CPAN 1 comment ,

If you’re using DBI directly for your database access, not through some ORM, then fetchrow_hashref is probably the handiest way to fetch result rows. However, if you’re working on lots of rows and time is critical, know that it is also be the slowest way to do so.

Here’s a benchmark that shows that binding columns with bind_column takes half the runtime of fetchrow_hashref.


use strict;
use warnings;
use 5.010;

use Benchmark ':hireswallclock';

our $ITERATIONS = 1_000_000;
our $sth;

sub prep_handle {
    my $sql = <<"EOF";
    SELECT title, author, isbn
    FROM title
    WHERE ROWNUM < $ITERATIONS
EOF
    return sqldo_handle( $sql );  # Calls DBI->prepare
}

sub hashref {
    while ( my $row = $sth->fetchrow_hashref ) {
        my $title  = $row->{title};
        my $author = $row->{author};
        my $isbn   = $row->{isbn};
    }
    $sth->finish;
}

sub array {
    while ( my @row = $sth->fetchrow_array ) {
        my ($title,$author,$isbn) = @row;
    }
    $sth->finish;
}

sub arrayref {
    while ( my $row = $sth->fetchrow_arrayref ) {
        my $title  = $row->[0];
        my $author = $row->[1];
        my $isbn   = $row->[2];
    }
    $sth->finish;
}

sub direct_bind {
    $sth->bind_columns( \my $title, \my $author, \my $isbn );
    while ( my $row = $sth->fetch ) {
        # No need to copy.
    }
    $sth->finish;
}

say "Running $ITERATIONS iterations";
for my $func ( qw( hashref array arrayref direct_bind ) ) {
    $sth = prep_handle();
    my $t = timeit( 1, "$func()" );
    printf( "%-11s took %s\n", $func, timestr($t) );
}

Which gives these results

$ ./dbi-bind-bench
Running 1000000 iterations
hashref     took 7.37747 wallclock secs ( 4.98 usr +  0.25 sys =  5.23 CPU) @  0.19/s (n=1)
array       took 4.01768 wallclock secs ( 1.68 usr +  0.19 sys =  1.87 CPU) @  0.53/s (n=1)
arrayref    took 3.86365 wallclock secs ( 1.60 usr +  0.16 sys =  1.76 CPU) @  0.57/s (n=1)
direct_bind took 3.36962 wallclock secs ( 1.13 usr +  0.15 sys =  1.28 CPU) @  0.78/s (n=1)

When speed is key, bind your output variables directly.

ack 2.18 has been released; ack 3 starting development

March 24, 2017 CPAN, Tools No comments

I’ve just uploaded ack 2.18 to CPAN and to https://beyondgrep.com/.

ack 2.18 will probably be the final release in the ack 2.x series. I’m going to be starting work on ack 3.000 in earnest.  Still, if you discover problems with ack 2, please report them to https://github.com/petdance/ack2/issues

If you’re interested in ack 3 development, please sign up for the ack-dev mailing list and/or join the ack Slack.  See https://beyondgrep.com/community/ for details.

2.18    Fri Mar 24 14:53:19 CDT 2017
====================================
[FIXES]
ack no longer throws an undefined variable warning if it's called
from a directory that doesn't exist. (GH #634)

--context=0 (and its short counterpart -C 0) did not set to context
of 0.  This means that a command-line --context=0 couldn't override
a --context=5 in your ackrc.  Thanks, Ed Avis.  (GH #595)

t/ack-s.t would fail in non-English locales.  Thanks, Olivier Mengué.
(GH #485, GH #515)

[ENHANCEMENTS]
--after-context and --before-context (and their short counterparts
-A and -B) no longer require a value to be passed.  If no value is
set, they default to 2. (GH #351)

Added .xhtml to the --html filetype.  Added .wsdl to the --xml filetype.
Thanks, H.Merijn Brand.  (GH #456)

[DOCUMENTATION]
Explain that filetypes must be two characters or longer. (GH #389)

Updated incorrect docs about how ack works.  Thanks, Gerhard Poul.
(GH #543)

[INTERNALS]
Removed the abstraction of App::Ack::Resource and its subclass
App::Ack::Resource::Basic.  We are abandoning the idea that we'll have
plugins.

Removed dependency on File::Glob which isn't used.