Bind output variables in DBI for speed and safety

| 5 Comments

When in a tight loop of many records from a database, using the quick & dirty solution of calling $sth->fetchrow_hashref can be expensive. I was working on a project to walk through 6,000,000 records and it was slower than I wanted. Some benchmarking showed me that I was paying dearly for the convenience of being able to say my $title = $row->{title};.

When I converted my code to bind variables to the columns in the statement handle, I cut my run time about 80%. It was as simple as adding this line:

    $sth->bind_columns( \my $interestlevel,
        \my $av_flag, \my $isbn, \my $title );

before calling the main loop through the database. Now DBI knows to put the data directly in there, without creating an expensive temporary hash. I also can't make a typo such as $row->{ISBN} inside the loop, so there's a measure of safety as well.

The benchmarks below show the relative speeds of each of four techniques:

hashref     took 31.5048 wallclock secs
array       took 8.83724 wallclock secs
arrayref    took 5.5308 wallclock secs
direct_bind took 4.46956 wallclock secs

Here's the key parts of the benchmark program I used:

use Benchmark ':hireswallclock';
sub hashref {
    while ( my $row = $sth->fetchrow_hashref ) {
        my $interestlevel = $row->{interestlevel};
        my $av_flag = $row->{av_flag};
        my $isbn = $row->{isbn};
        my $title = $row->{title};
    }
    $sth->finish;
}

sub array {
    while ( my @row = $sth->fetchrow_array ) {
        my ($interestlevel, $av_flag, $isbn, $title) = @row;
    }
    $sth->finish;
}

sub arrayref {
    while ( my $row = $sth->fetchrow_arrayref ) {
        my $interestlevel = $row->[0];
        my $av_flag = $row->[1];
        my $isbn = $row->[2];
        my $title = $row->[3];
    }
    $sth->finish;
}

sub direct_bind {
    $sth->bind_columns( \my $interestlevel, 
        \my $av_flag, \my $isbn, \my $title );
    while ( my $row = $sth->fetch ) {
        # no need to copy
    }
    $sth->finish;
}

for my $func ( qw( hashref array arrayref direct_bind ) ) {
    my $sql = <<"EOF";
    select interestlevel, av_flag, isbn, title
    from testbook
    limit 1000000
EOF
    # This sub calls the SQL and returns a statement handle
    $sth = sqldo_handle( $sql );
    my $t = timeit( 1, "$func()" );
    print "$func took ", timestr($t), "\n";
}

Did you find this article useful? Or does it not belong on Perlbuzz? Let us know what you think.

5 Comments

Of course if you still want the convenience of using a hash, you can bind to that as well. chromatic gives some good examples on how to do this in his classic article, DBI is ok.

Wow, I didn't even know that was possible.

I think it's time for me read the manual again, this time from start to end ;-)

Why didn't you run the obvious fourth test - assigning directly to the list of your variables, without the intermediary array?

while (my ($interestlevel, $av_flag, $isbn, $title) = $sth->fetchrow_array()) { ... }

If you believe the performance hit is a direct result of creating a temporary variable, then it would seem to make sense to try the method that eliminates that temporary variable.

just a thought, but could that drastic boost in performance from the hashref to array test be a result of query caching? just something to check on so as to be sure you are representing true performance gain through this method.

It could have been, so I tried running the different queries in different orders. The drastic difference was still there.

Leave a comment

Job hunting for programmers


Land the Tech Job You Love, Andy Lester's guide to job hunting for programmers and other technical professionals, is available in PDF, ePub and .mobi formats, all DRM-free, as well as good old-fashioned paper.