When in a tight loop of many records from a database, using the quick & dirty solution of calling $sth->fetchrow_hashref can be expensive. I was working on a project to walk through 6,000,000 records and it was slower than I wanted. Some benchmarking showed me that I was paying dearly for the convenience of being able to say my $title = $row->{title};.
When I converted my code to bind variables to the columns in the statement handle, I cut my run time about 80%. It was as simple as adding this line:
$sth->bind_columns( \my $interestlevel,
\my $av_flag, \my $isbn, \my $title );
before calling the main loop through the database. Now DBI knows to put the data directly in there, without creating an expensive temporary hash. I also can't make a typo such as $row->{ISBN} inside the loop, so there's a measure of safety as well.
The benchmarks below show the relative speeds of each of four techniques:
hashref took 31.5048 wallclock secs array took 8.83724 wallclock secs arrayref took 5.5308 wallclock secs direct_bind took 4.46956 wallclock secs
Here's the key parts of the benchmark program I used:
use Benchmark ':hireswallclock';
sub hashref {
while ( my $row = $sth->fetchrow_hashref ) {
my $interestlevel = $row->{interestlevel};
my $av_flag = $row->{av_flag};
my $isbn = $row->{isbn};
my $title = $row->{title};
}
$sth->finish;
}
sub array {
while ( my @row = $sth->fetchrow_array ) {
my ($interestlevel, $av_flag, $isbn, $title) = @row;
}
$sth->finish;
}
sub arrayref {
while ( my $row = $sth->fetchrow_arrayref ) {
my $interestlevel = $row->[0];
my $av_flag = $row->[1];
my $isbn = $row->[2];
my $title = $row->[3];
}
$sth->finish;
}
sub direct_bind {
$sth->bind_columns( \my $interestlevel,
\my $av_flag, \my $isbn, \my $title );
while ( my $row = $sth->fetch ) {
# no need to copy
}
$sth->finish;
}
for my $func ( qw( hashref array arrayref direct_bind ) ) {
my $sql = <<"EOF";
select interestlevel, av_flag, isbn, title
from testbook
limit 1000000
EOF
# This sub calls the SQL and returns a statement handle
$sth = sqldo_handle( $sql );
my $t = timeit( 1, "$func()" );
print "$func took ", timestr($t), "\n";
}
Did you find this article useful? Or does it not belong on Perlbuzz? Let us know what you think.
Of course if you still want the convenience of using a hash, you can bind to that as well. chromatic gives some good examples on how to do this in his classic article, DBI is ok.
Wow, I didn't even know that was possible.
I think it's time for me read the manual again, this time from start to end ;-)
Why didn't you run the obvious fourth test - assigning directly to the list of your variables, without the intermediary array?
while (my ($interestlevel, $av_flag, $isbn, $title) = $sth->fetchrow_array()) { ... }
If you believe the performance hit is a direct result of creating a temporary variable, then it would seem to make sense to try the method that eliminates that temporary variable.
just a thought, but could that drastic boost in performance from the hashref to array test be a result of query caching? just something to check on so as to be sure you are representing true performance gain through this method.
It could have been, so I tried running the different queries in different orders. The drastic difference was still there.