Perl 5: November 2007 Archives

Tim Bunce points me to this post about Perl being faster than Ruby in Tim Bray's Wide Finder code competition.

The Wide Finder is at heart an Apache log analysis tool to show commonly hit pages, but for purposes of this comparison, it's analyzing 971MB. Bray explains:

It’s a classic example of the culture, born in Awk, perfected in Perl, of getting useful work done by combining regular expressions and hash tables. I want to figure out how to write an equivalent program that runs fast on modern CPUs with low clock rates but many cores; this is the Wide Finder project.

All the talk about Erlang and parallelism makes me want to get back to working through my copy of Programming Erlang. Oh tuits, come to me!

Rafaël Garcia-Suarez has put out the first release candidate for Perl 5.10.0. This will be the first new release of a production version of Perl in over 2½ years, so well worth taking a look at.

Again, for an introduction to the features in Perl 5.10's new features, see Ricardo Signes' slides for his talk Perl 5.10 For People Who Aren't Totally Insane.

I applaud Michael Schwern's announcement today that he will no longer be supporting Perl 5.5 in any of his modules. Toolchain modules like Test::More and ExtUtils::MakeMaker will be compatible with Perl 5.6.0, and others with 5.8.0. As Schwern puts it, "5.5 is effectively end-of-lifed." And not a moment too soon, I believe. Perl 5.6.0 came out seven years ago, and 5.8.0 five.

Schwern's breaking point was seeing the Perl Survey results that only 6% of respondents use Perl 5.5. Most of all, he points out:

Finally, I'm coming around to chromatic's philosophy: why are we worried about the effect of upgrades on users who don't upgrade? Alan Burlson's comments about Solaris vs Linux are telling: if you're worried more about supporting your existing users then finding new ones, you're dead.

I applaud Schwern's radical break from the past. No longer will he be "hamstrung from using 'new' features of Perl," as he puts it. This will allow him the freedom to do more great things as I fully expect he will.

Most of all, I'm glad that he just did it. No committee, no call for consensus, no poll of people to see what everyone thought. JFDI, baby, JFDI.

Who among us will be the first to write a module that takes advantage of Perl 5.10's new features, urging us all forward, instead of mired in the mud of the past? I can't wait to see it happen.

A few days ago Gerard Goossen released version 1.5 of his kurila project to the CPAN, a fork of Perl 5, both the language and the implementation. I talked with about the history of this new direction.


Andy: Why Kurila? Who would want to use it? What are your goals?

Gerard: Kurila is a fork of Perl 5. Perl Kurila is a dialect of Perl. Kurila is currently unstable, the language is continuously changing, and has just started.

There are a few goals, not all of them going in the same direction. One of the goals is to simplify the Perl internals to make hacking on it easier. Another is to make the Perl syntax more consistent, remove some of the oddities, most of them historical legacy.

What is currently being done is removing some of the more object/error-prone syntax like indirect-object-syntax and removing symbol references. Both of these are not yet very radical yet, most modern Perl doesn't use indirect-object-syntax or symbol references.

But I am now at the stage of doing more radical changes, like not doing the sigil-change, so that my %foo; $foo{bar} would become my %foo; %foo{bar} .

Andy: Where do you see Kurila getting used? Who's the target audience for it?

Gerard: Kurila would be used for anything where currently Perl is being used. I am using Perl for large websites so changes will be favored in that direction.

I am working for TTY Internet Solutions, a web development company. We develop and maintain websites in Perl, Ruby and Java. Websites we develop include www.2dehands.be, www.sellaband.com, www.ingcard.nl and www.nationalevacaturebank.nl. Of these www.2dehands.be and www.nationalevacaturebank.nl are entirely written in Perl.

We are not yet using kurila in production, but I have a testing environment of www.2dehands.nl which is running on Kurila. Developing Kurila is part of my work at TTY.

Many of the changes in Kurila are inspired by bugs/mistakes we made developing these sites. It started with the UTF8 flag. We encountered many problems making our websites UTF-8 compatible. In many cases the UTF8-flag got "lost" somewhere, and after combining it with another string, the string got internally upgraded and our good UTF-8 destroyed. Because everything we have is default UTF-8. The idea was simply to make UTF-8 the default encoding, instead of the current default of latin1.

Andy: Did you raise the possibility of changing the default encoding in Perl?

Gerard: The problem is that changing the default encoding the UTF-8 is that is destroys the identity between bytes and codepoints. So it's not a possibility for Perl 5. Like what does chr(255) do? Does it create a byte with value 255 or character with codepoint 255?

I made a patch removing the UTF-8 flag and changing the default encoding to UTF-8 and sent it to p5p.

Andy: What was the response?

Gerard: There was as good as no response to it, I guess because it was obvious that it seriously broke backwards compatibility and the patch was quite big, making it difficult to understand.

About two weeks after the utf8 patch, I announced that I wanted to change the current Perl 5 development to make it a language which evolves to experiment with new ideas, try new syntax and not be held back by old failed experiments. One of the interesting things about Perl is that it has a lot of different ideas and these are coupled to the syntax.

There was of course the question of why not Perl 6.  That it should/could be done in backwards-compatible way. That there is no way of making the Perl internals clean, that is better to start over.

And about half a year ago I announced that I had started Kurila, my proof of-concent for the development of Perl 7. Rewriting some software from scratch is much more difficult then it seems, and I think starting with a well proven good working base is much easier. Perl 5 is there, it is working very good, has few bugs, etc., but it can be much better if you don't have to worry about possibly breaking someone code, and just fix those oddities.

Andy: Do you have a website for it?  Are you looking for help?

Gerard: There isn't a website yet, and also no specific mailing list, currently all the discussion is on p5p. There is a public git repository at git://dev.tty.nl/perl.

Andy: What can someone do if he/she is interested in helping?

Gerard: Contact me at gerard at tty dot nl. Make a clone of git://dev.tty.nl/perl and start making changes.

When in a tight loop of many records from a database, using the quick & dirty solution of calling $sth->fetchrow_hashref can be expensive. I was working on a project to walk through 6,000,000 records and it was slower than I wanted. Some benchmarking showed me that I was paying dearly for the convenience of being able to say my $title = $row->{title};.

When I converted my code to bind variables to the columns in the statement handle, I cut my run time about 80%. It was as simple as adding this line:

    $sth->bind_columns( \my $interestlevel,
        \my $av_flag, \my $isbn, \my $title );

before calling the main loop through the database. Now DBI knows to put the data directly in there, without creating an expensive temporary hash. I also can't make a typo such as $row->{ISBN} inside the loop, so there's a measure of safety as well.

The benchmarks below show the relative speeds of each of four techniques:

hashref     took 31.5048 wallclock secs
array       took 8.83724 wallclock secs
arrayref    took 5.5308 wallclock secs
direct_bind took 4.46956 wallclock secs

Here's the key parts of the benchmark program I used:

use Benchmark ':hireswallclock';
sub hashref {
    while ( my $row = $sth->fetchrow_hashref ) {
        my $interestlevel = $row->{interestlevel};
        my $av_flag = $row->{av_flag};
        my $isbn = $row->{isbn};
        my $title = $row->{title};
    }
    $sth->finish;
}

sub array {
    while ( my @row = $sth->fetchrow_array ) {
        my ($interestlevel, $av_flag, $isbn, $title) = @row;
    }
    $sth->finish;
}

sub arrayref {
    while ( my $row = $sth->fetchrow_arrayref ) {
        my $interestlevel = $row->[0];
        my $av_flag = $row->[1];
        my $isbn = $row->[2];
        my $title = $row->[3];
    }
    $sth->finish;
}

sub direct_bind {
    $sth->bind_columns( \my $interestlevel, 
        \my $av_flag, \my $isbn, \my $title );
    while ( my $row = $sth->fetch ) {
        # no need to copy
    }
    $sth->finish;
}

for my $func ( qw( hashref array arrayref direct_bind ) ) {
    my $sql = <<"EOF";
    select interestlevel, av_flag, isbn, title
    from testbook
    limit 1000000
EOF
    # This sub calls the SQL and returns a statement handle
    $sth = sqldo_handle( $sql );
    my $t = timeit( 1, "$func()" );
    print "$func took ", timestr($t), "\n";
}

Did you find this article useful? Or does it not belong on Perlbuzz? Let us know what you think.

The first release candidate of Perl 5.10, with the first new syntax and major features since 2002, will be released soon, probably in the next week or two. The code has been in feature freeze for weeks, and only minor patches are being accepted. Lately the VMS porters have been working on compatibility problems with File::Spec and File::Patch, and Jos Boumans and Ken Williams are syncing core CPAN modules with the Perl source trunk.

For a gentle and well-presented introduction to the features in Perl 5.10, see Ricardo Signes' slides for his talk Perl 5.10 For People Who Aren't Totally Insane.

Please note that Perl 5.10 is not Ponie. Ponie was the project that was to put Perl 5.10 on top of the Parrot virtual machine, but Ponie has been put out to pasture.

About this Archive

This page is a archive of entries in the Perl 5 category from November 2007.

Perl 5: December 2007 is the next archive.

Find recent content on the main index or look in the archives to find all content.

Other Perl Sites

Other Swell Blogs

  • geek2geek: An ongoing analysis of how geeks communicate, how we fail and how to fix it.
Technorati Profile