Bind output variables in DBI for speed and safety

| 5 Comments

When in a tight loop of many records from a database, using the quick & dirty solution of calling $sth->fetchrow_hashref can be expensive. I was working on a project to walk through 6,000,000 records and it was slower than I wanted. Some benchmarking showed me that I was paying dearly for the convenience of being able to say my $title = $row->{title};.

When I converted my code to bind variables to the columns in the statement handle, I cut my run time about 80%. It was as simple as adding this line:

    $sth->bind_columns( \my $interestlevel,
        \my $av_flag, \my $isbn, \my $title );

before calling the main loop through the database. Now DBI knows to put the data directly in there, without creating an expensive temporary hash. I also can't make a typo such as $row->{ISBN} inside the loop, so there's a measure of safety as well.

The benchmarks below show the relative speeds of each of four techniques:

hashref     took 31.5048 wallclock secs
array       took 8.83724 wallclock secs
arrayref    took 5.5308 wallclock secs
direct_bind took 4.46956 wallclock secs

Here's the key parts of the benchmark program I used:

use Benchmark ':hireswallclock';
sub hashref {
    while ( my $row = $sth->fetchrow_hashref ) {
        my $interestlevel = $row->{interestlevel};
        my $av_flag = $row->{av_flag};
        my $isbn = $row->{isbn};
        my $title = $row->{title};
    }
    $sth->finish;
}

sub array {
    while ( my @row = $sth->fetchrow_array ) {
        my ($interestlevel, $av_flag, $isbn, $title) = @row;
    }
    $sth->finish;
}

sub arrayref {
    while ( my $row = $sth->fetchrow_arrayref ) {
        my $interestlevel = $row->[0];
        my $av_flag = $row->[1];
        my $isbn = $row->[2];
        my $title = $row->[3];
    }
    $sth->finish;
}

sub direct_bind {
    $sth->bind_columns( \my $interestlevel, 
        \my $av_flag, \my $isbn, \my $title );
    while ( my $row = $sth->fetch ) {
        # no need to copy
    }
    $sth->finish;
}

for my $func ( qw( hashref array arrayref direct_bind ) ) {
    my $sql = <<"EOF";
    select interestlevel, av_flag, isbn, title
    from testbook
    limit 1000000
EOF
    # This sub calls the SQL and returns a statement handle
    $sth = sqldo_handle( $sql );
    my $t = timeit( 1, "$func()" );
    print "$func took ", timestr($t), "\n";
}

Did you find this article useful? Or does it not belong on Perlbuzz? Let us know what you think.

5 Comments

Of course if you still want the convenience of using a hash, you can bind to that as well. chromatic gives some good examples on how to do this in his classic article, DBI is ok.

Wow, I didn't even know that was possible.

I think it's time for me read the manual again, this time from start to end ;-)

Why didn't you run the obvious fourth test - assigning directly to the list of your variables, without the intermediary array?

while (my ($interestlevel, $av_flag, $isbn, $title) = $sth->fetchrow_array()) { ... }

If you believe the performance hit is a direct result of creating a temporary variable, then it would seem to make sense to try the method that eliminates that temporary variable.

just a thought, but could that drastic boost in performance from the hashref to array test be a result of query caching? just something to check on so as to be sure you are representing true performance gain through this method.

It could have been, so I tried running the different queries in different orders. The drastic difference was still there.

Leave a comment