September 2008 Archives

Optimizing for the developer, not the user: PHP misses again

| 15 Comments

PHP refuses to let you report a bug in any version of PHP older than the absolute latest & greatest.

At work today, we discovered a bug with PDO, the PHP version of Perl's DBI. Turns out if you pass in too many bind parameters, PDO segfaults. Here's the simple program that Pete Krawczyk put together to exercise it.

<?php
$dbh = new PDO( 'pgsql:host=localhost;dbname=FOO', 'PASSWORD', '', Array(
    PDO::ATTR_PERSISTENT   => true,
    PDO::ERRMODE_EXCEPTION => true,
) );

$array = Array();
for ($i = 1; $i < 10; $i++) {
    $array[] = $i;
    $sth = $dbh->prepare('SELECT 1 FROM USERS WHERE CUSTID = ? LIMIT 1');
    $sth->execute($array);
    while ( $sth->fetch( PDO::FETCH_NUM ) ) {
        # do nothing
    }
    unset($sth);
    print "PDO lived OK with $i bind" . ($i == 1 ? '' : 's') . "\n";
}
?>

It's repeatable for us on PHP 5.2.5. So after searching to see that nobody else had already reported it on bugs.php.net, I went to report it.

Alas, when I went to report the bug, I was not able to. My bug happened in 5.2.5, but according to the dialog, that wasn't an option. No, I was left with "Earlier? Upgrade first!"

Latest and greatest only, please!

No, PHP, I am not going to upgrade my PHP installation in order to be blessed with the opportunity of telling you about a segfault in a version of software one minor revision older.

No, PHP, I am not going to spend an hour building and installing another monolithic PHP on some test server so that I might gain the privilege, the privilege I say!, of helping out your project.

What a backwards way to look at open source development! "You must be at least this tall in order to report bugs." What a way to help scare away contributors.

Perhaps you should have a look at how Perl handles it, where we have a wide open ticketing system. There's a tool called perlbug that ships with Perl to encourage responses. The perl5-porters might get some inappropriate bug reports, maybe in a module rather than core Perl, but those are easily closed. We don't put up barriers to reporting. We know how to treat the outside world, because we welcome the feedback.

Get a clue, PHP people.

Writing a crawler with WWW::Mechanize

| No Comments

Stefan Petrea has written up a summary of his building of an MP3 website crawler using WWW::Mechanize and an RDBMS. It's a good write-up, and good overview of the issues of crawling beyond the obvious "open a page, get the links, follow the links.".

Index of online articles from The Perl Journal

| 2 Comments

brian d foy has put together an index of articles published in The Perl Journal from 2003 to 2006, all available on the web.

A lot of the articles are out of date, of course (nobody needs my 2004 OSCON roundup, do they?), but others like Simon Cozens' Ten Things You (Probably) Didn't Know About Perl still stand up today.

Check out the list, and please post back here about the most useful article you found.

Announcing "The Working Geek"

| 2 Comments

I started a new blog a while ago called The Working Geek, devoted to work issues of interest to techies of all stripes who work for a living. Topics have included how to speak Manager, personal networking and closing the deal at a job interview.

I'm going to be posting much more as I approach completion of my book Land the Tech Job You Love: Why Skill and Luck Are Not Enough to be published by Pragmatic Bookshelf. Bits of the book will probably make their way into blog posts, and I'll be posting more about work and tech articles that I see.

I hope to see you there. For convenience and so you don't have to leave the comfort of your newsreader, here's the feed.

Perl best administration practices

| 2 Comments

Michael Schwern has started a fantastic page on the Perl 5 Wiki onbest practices for keeping your Perl installation sane and happy.

Some high-level excerpts:

First and foremost thing I can say is if you depend heavily on Perl (or any single piece of technology) build it yourself.... Second, if you do build your own Perl, leave the system Perl alone..... Third, isolate your perl installs so you can have many installed in parallel....

Well-written information from someone who knows, and since it's a wiki, you can help add to it as well.

Perl 5.8.9's perldelta needs your help

| No Comments

By Paul Fenwick

Perl 5.8.9 is just around the corner. Incorporating over two and a half years of bugfixes and improvements, it will be the best release of Perl 5.8 ever. Unfortunately, we have a problem; right now there's no easy way for the average developer to know what's changed.

Every version of Perl ships with a perldelta file, which summarises all the important changes into a single document for anyone who wants to know what's new. This document needs to be written before 5.8.9 can ship, and it's a big task. Luckily, it's also a task that can be distributed, and we need your help.

The work has been split into individual months of changelog that need to be summarised. You can volunteer for as little or as much work as you like. Even if you don't know much about Perl's internals, you can volunteer for a "light" approach where you summarise easy and obvious changes, like upgraded modules, or easily-understood bugfixes.

Contributing the perl589delta directly helps with the release of Perl 5.8.9. However you'll also get a mention in the prestigious Perl authors file, kudos on ohloh, and enough material to write a "What's new in 5.8.9" lightning talk that will make you a star at conferences and user-groups.

To get started, join the mailing list. If you're happy to dive into work right away, and we hope that you are, then follow the instructions in the README at the bottom of our source control page.

Don't worry if you don't think you can handle a whole month of changes at once. Don't worry if you don't know your way around the Perl guts. If you want to start small, you can use the micro helpers HOWTO which describes how you can contribute with a minimum of fuss.

Even if you're not sure how to help, we'd still love to see you in the group; there's plenty of things that need doing. With your help, we can make Perl 5.8.9 a reality!

Paul Fenwick is the managing director of Perl Training Australia. He is the author of Perl's new autodie pragma, an internationally recognised conference speaker, and author of many editions of Perl Tips. His interests include coffee, mycology, scuba diving, applied statistics, and lexically scoped user pragmata.

The relation between CPAN Testers and quality, or, Why CPAN Testers sucks if you don't need it

| 2 Comments

by David Golden

There have been some mega-email threads about CPAN Testers on the perl-qa mailing list that started with a question about the use of exit 0 in Makefile.PL.

I want to sum up a few things that I took away from the conversations and propose a series of major changes to CPAN Testers. Special thanks to an off-list (and very civil) conversation with chromatic for triggering some of these thoughts.

Type I and Type II errors

In statistics, a Type I error means a "false positive" or "false alarm". For CPAN Testers, that's a bogus FAIL report. A Type II error means a "false negative", e.g. a bogus PASS report. Often, there is a trade-off between these. If you think about spam filtering as an example, reducing the chance of spam getting through the filter (false negatives) tends to increase the odds that legitimate mail gets flagged as spam (false positives).

Generally, those involved in CPAN Testers have taken the view that it's better to have a false positives (false alarms) than false negatives (a bogus PASS report). Moreover, we've tended to believe -- without any real analysis -- that the false positive *ratio* (false FAILs divided by all FAILs) is low.

But I've never heard a single complaint about a bogus PASS report and I hear a lot of complaints about bogus FAILS, so it's reasonable to think that we've got the tradeoff wrong. Moreover, I think the downside to false positives is actually higher than for false negatives if we believe that CPAN Testers is primarily a tool to help authors improve quality rather than a tool to give users a guarantee about how distributions work on any given platform.

False positive ratios by author

Even if the aggregate false positive ratio is low, individual CPAN authors can experience extraordinarily high false positive ratios. What I suddenly realized is that the higher the quality of an author's distributions, the higher the false positive ratio.

Consider a "low quality" author -- one who is prone to portability errors, missing dependencies and so on. Most of the FAIL reports are legitimate problems with the distribution.

Now consider a "high quality" author -- one who is careful to write portable code, well-specified dependencies and so on. For this author, most of the FAIL reports only come when a tester has a broken or misconfigured toolchain The false positive ratio will approach 100%.

In other words, the *reward* that CPAN Testers has for high quality is increased annoyance from false FAIL reports with little benefit.

Repetition is desensitizing

From a statistical perspective, having lots of CPAN Testers reports for a distribution even on a common platform helps improve confidence in the aggregate result. Put differently, it helps weed out "outlier" reports from a tester who happens to have a broken toolchain.

However, from author's perspective, if a report is legitimate (and assuming they care), they really only need to hear it once. Having more and more testers sending the same FAIL report on platform X is overkill and gives yet more encouragement for authors to tune out.<\p>

So the more successful CPAN Testers is in attracting new testers, the more duplicate FAIL reports authors are likely to receive, which makes them less likely to pay attention to them.

When is a FAIL not a FAIL?

There are legitimate reasons that distributions could be broken such that they fail during PL or make in ways that are not the fault of the tester's toolchain, so it still seems like valuable information to know when distributions can't build as well as when they don't pass tests. So we should report on this and not just skip reporting. On the other hand, most of the false positives that provoke complaint are toolchain issues during PL or make/Build.

Right now there is no easy way to distinguish the phase of a FAIL report from the subject of an email. Removing PL and make/Build failures from the FAIL category would immediately eliminate a major source of false positives in the FAIL category and decrease the aggregate false positive ratio in the FAIL category. Though, as I've shown, while this may decrease the incidence of false positives for high quality authors, the false positive ratio is likely to remain high.

It almost doesn't matter whether we reclassify these as UNKNOWN or invent new grades. Either way partitions the FAIL space in a way that makes it easier for authors to focus on which ever part of the PL/make/test cycle they care about.

What we can fix now and what we can't

Some of these issues can be addressed fairly quickly.

First, we can lower our collective tolerance of false positives -- for example, stop telling authors to just ignore bogus reports if they don't like it and find ways to filter them. We have several places to do this -- just in the last day we've confirmed that the latest CPANPLUS dev version doesn't generate Makefile.PL's and some testers have upgraded. BinGOs has just put out CPANPLUS::YACSmoke 0.04 that filters out these cases anyway if testers aren't on the bleeding edge of CPANPLUS. We now need to push testers to upgrade. As we find new false positives, we need to find new ways to detect and suppress them.

Second, we can reclassify PL/make/Build fails to UNKNOWN. This won't break any of the existing reporting infrastructure the way that adding new grades would. I can make this change in CPAN::Reporter in a matter of minutes and it probably wouldn't be hard to do the same in CPANPLUS. Then we need another round of pushing testers to upgrade their tools. We could also take a decision as to whether UNKNOWN reports should be copied to authors by default or just sent to the mailing list.

However, as long as the CPAN Testers system has individual testers emailing authors, there is little we can do to address the problem of repetition. One option is to remove that feature from Test::Reporter and reports will only go to the central list. With the introduction of an RSS feed (even if not yet optimal), authors will have a way to monitor reports. And from that central source, work can be done to identify duplicative reports and start screening them out of notifications.

Once that is more or less reliable, we could restart email notifications from that central source if people felt that nagging is critical to improve quality. Personally, I'm coming around to the idea that it's not the right way to go culturally for the community. We should encourage people to use these tools, sign up for RSS or email alerts, whatever, because they think that quality is important. If the current nagging approach is alienating significant numbers of perl-qa members, how can we possibly expect that it's having a positive influence on everyone else?

Some of these proposal would be easier in CPAN Testers 2.0, which will provide reports as structured data instead of email text, but if "exit 0" is a straw that is breaking the Perl camel's back now, then we can't ignore 1.0 to work on 2.0 as I'm not sure anyone will care anymore by the time it's done.

What we can't do easily is get the testers community to upgrade to newer versions of the tools. That is still going to be a matter of announcements and proselytizing and so on. But I think we can make a good case for it, and if we can get the top 10 or so testers to upgrade across all their testing machines then I think we'll make a huge dent in the false positives that are undermining support for CPAN Testers as a tool for Perl software quality.

I'm interested in feedback on these ideas. In particular, I'm now convinced that the "success" of CPAN Testers now prompts the need to move PL/make fails to UNKNOWN and to discontinue copying authors by individual testers. I'm open to counter-arguments, but they'll need to convince me of a better long-run solution to the problems I identified.

David Golden is a CPAN Tester and prolific CPAN author with over two dozen modules released, including the groundbreaking CPAN::Reporter and Class::InsideOut. David was the release engineer for the alpha versions of Strawberry Perl. He has been a speaker at YAPC::NA, The New York Perl Seminar, and Boston.pm and has written articles for The Perl Review. David lives in New York City.

« August 2008 | Main Index | Archives | October 2008 »