Michael Schwern has started a fantastic page on the Perl 5 Wiki onbest practices for keeping your Perl installation sane and happy.
Some high-level excerpts:
First and foremost thing I can say is if you depend heavily on Perl (or any single piece of technology) build it yourself.... Second, if you do build your own Perl, leave the system Perl alone..... Third, isolate your perl installs so you can have many installed in parallel....
Well-written information from someone who knows, and since it's a wiki, you can help add to it as well.
By Paul Fenwick
Perl 5.8.9 is just around the corner. Incorporating over two and a half years of bugfixes and improvements, it will be the best release of Perl 5.8 ever. Unfortunately, we have a problem; right now there's no easy way for the average developer to know what's changed.
Every version of Perl ships with a perldelta file, which summarises all the important changes into a single document for anyone who wants to know what's new. This document needs to be written before 5.8.9 can ship, and it's a big task. Luckily, it's also a task that can be distributed, and we need your help.
The work has been split into individual months of changelog that need to be summarised. You can volunteer for as little or as much work as you like. Even if you don't know much about Perl's internals, you can volunteer for a "light" approach where you summarise easy and obvious changes, like upgraded modules, or easily-understood bugfixes.
Contributing the perl589delta directly helps with the release of Perl 5.8.9. However you'll also get a mention in the prestigious Perl authors file, kudos on ohloh, and enough material to write a "What's new in 5.8.9" lightning talk that will make you a star at conferences and user-groups.
Don't worry if you don't think you can handle a whole month of changes at once. Don't worry if you don't know your way around the Perl guts. If you want to start small, you can use the micro helpers HOWTO which describes how you can contribute with a minimum of fuss.
Even if you're not sure how to help, we'd still love to see you in the group; there's plenty of things that need doing. With your help, we can make Perl 5.8.9 a reality!
Paul Fenwick is the managing director of Perl Training Australia. He is the author of Perl's new autodie pragma, an internationally recognised conference speaker, and author of many editions of Perl Tips. His interests include coffee, mycology, scuba diving, applied statistics, and lexically scoped user pragmata.
by David Golden
I want to sum up a few things that I took away from the conversations and propose a series of major changes to CPAN Testers. Special thanks to an off-list (and very civil) conversation with chromatic for triggering some of these thoughts.
Type I and Type II errors
In statistics, a Type I error means a "false positive" or "false alarm". For CPAN Testers, that's a bogus FAIL report. A Type II error means a "false negative", e.g. a bogus PASS report. Often, there is a trade-off between these. If you think about spam filtering as an example, reducing the chance of spam getting through the filter (false negatives) tends to increase the odds that legitimate mail gets flagged as spam (false positives).
Generally, those involved in CPAN Testers have taken the view that it's better to have a false positives (false alarms) than false negatives (a bogus PASS report). Moreover, we've tended to believe -- without any real analysis -- that the false positive *ratio* (false FAILs divided by all FAILs) is low.
But I've never heard a single complaint about a bogus PASS report and I hear a lot of complaints about bogus FAILS, so it's reasonable to think that we've got the tradeoff wrong. Moreover, I think the downside to false positives is actually higher than for false negatives if we believe that CPAN Testers is primarily a tool to help authors improve quality rather than a tool to give users a guarantee about how distributions work on any given platform.
False positive ratios by author
Even if the aggregate false positive ratio is low, individual CPAN authors can experience extraordinarily high false positive ratios. What I suddenly realized is that the higher the quality of an author's distributions, the higher the false positive ratio.
Consider a "low quality" author -- one who is prone to portability errors, missing dependencies and so on. Most of the FAIL reports are legitimate problems with the distribution.
Now consider a "high quality" author -- one who is careful to write portable code, well-specified dependencies and so on. For this author, most of the FAIL reports only come when a tester has a broken or misconfigured toolchain The false positive ratio will approach 100%.
In other words, the *reward* that CPAN Testers has for high quality is increased annoyance from false FAIL reports with little benefit.
Repetition is desensitizing
From a statistical perspective, having lots of CPAN Testers reports for a distribution even on a common platform helps improve confidence in the aggregate result. Put differently, it helps weed out "outlier" reports from a tester who happens to have a broken toolchain.
However, from author's perspective, if a report is legitimate (and assuming they care), they really only need to hear it once. Having more and more testers sending the same FAIL report on platform X is overkill and gives yet more encouragement for authors to tune out.
So the more successful CPAN Testers is in attracting new testers, the more duplicate FAIL reports authors are likely to receive, which makes them less likely to pay attention to them.
When is a FAIL not a FAIL?
There are legitimate reasons that distributions could be broken such that they fail during PL or make in ways that are not the fault of the tester's toolchain, so it still seems like valuable information to know when distributions can't build as well as when they don't pass tests. So we should report on this and not just skip reporting. On the other hand, most of the false positives that provoke complaint are toolchain issues during PL or make/Build.
Right now there is no easy way to distinguish the phase of a FAIL report from the subject of an email. Removing PL and make/Build failures from the FAIL category would immediately eliminate a major source of false positives in the FAIL category and decrease the aggregate false positive ratio in the FAIL category. Though, as I've shown, while this may decrease the incidence of false positives for high quality authors, the false positive ratio is likely to remain high.
It almost doesn't matter whether we reclassify these as UNKNOWN or invent new grades. Either way partitions the FAIL space in a way that makes it easier for authors to focus on which ever part of the PL/make/test cycle they care about.
What we can fix now and what we can't
Some of these issues can be addressed fairly quickly.
First, we can lower our collective tolerance of false positives -- for example, stop telling authors to just ignore bogus reports if they don't like it and find ways to filter them. We have several places to do this -- just in the last day we've confirmed that the latest CPANPLUS dev version doesn't generate Makefile.PL's and some testers have upgraded. BinGOs has just put out CPANPLUS::YACSmoke 0.04 that filters out these cases anyway if testers aren't on the bleeding edge of CPANPLUS. We now need to push testers to upgrade. As we find new false positives, we need to find new ways to detect and suppress them.
Second, we can reclassify PL/make/Build fails to UNKNOWN. This won't break any of the existing reporting infrastructure the way that adding new grades would. I can make this change in CPAN::Reporter in a matter of minutes and it probably wouldn't be hard to do the same in CPANPLUS. Then we need another round of pushing testers to upgrade their tools. We could also take a decision as to whether UNKNOWN reports should be copied to authors by default or just sent to the mailing list.
However, as long as the CPAN Testers system has individual testers emailing authors, there is little we can do to address the problem of repetition. One option is to remove that feature from Test::Reporter and reports will only go to the central list. With the introduction of an RSS feed (even if not yet optimal), authors will have a way to monitor reports. And from that central source, work can be done to identify duplicative reports and start screening them out of notifications.
Once that is more or less reliable, we could restart email notifications from that central source if people felt that nagging is critical to improve quality. Personally, I'm coming around to the idea that it's not the right way to go culturally for the community. We should encourage people to use these tools, sign up for RSS or email alerts, whatever, because they think that quality is important. If the current nagging approach is alienating significant numbers of perl-qa members, how can we possibly expect that it's having a positive influence on everyone else?
Some of these proposal would be easier in CPAN Testers 2.0, which will provide reports as structured data instead of email text, but if "exit 0" is a straw that is breaking the Perl camel's back now, then we can't ignore 1.0 to work on 2.0 as I'm not sure anyone will care anymore by the time it's done.
What we can't do easily is get the testers community to upgrade to newer versions of the tools. That is still going to be a matter of announcements and proselytizing and so on. But I think we can make a good case for it, and if we can get the top 10 or so testers to upgrade across all their testing machines then I think we'll make a huge dent in the false positives that are undermining support for CPAN Testers as a tool for Perl software quality.
I'm interested in feedback on these ideas. In particular, I'm now convinced that the "success" of CPAN Testers now prompts the need to move PL/make fails to UNKNOWN and to discontinue copying authors by individual testers. I'm open to counter-arguments, but they'll need to convince me of a better long-run solution to the problems I identified.
David Golden is a CPAN Tester and prolific CPAN author with over two dozen modules released, including the groundbreaking CPAN::Reporter and Class::InsideOut. David was the release engineer for the alpha versions of Strawberry Perl. He has been a speaker at YAPC::NA, The New York Perl Seminar, and Boston.pm and has written articles for The Perl Review. David lives in New York City.
By Eric Wilhelm, Perl coordinator for Google Summer of Code 2008
Google's Summer of Code 2008 is wrapping up now and I'm very pleased with how well The Perl Foundation's students and mentors have done. The five projects which survived the halfway point have all finished with great results.
Many thanks to all of the mentors and students as well as everyone in the community who helped or supported the process. Also, thanks to Google for putting on the program and to Richard Dice and Jim Brandt at TPF.
But the end is only the beginning. We should really get started on next year now. Perl needs to do a better job of attracting students, but I'll have to address these issues in another post.
Most of the students did a great job of blogging their progress, which I think is an important part of Summer of Code for the rest of the community. If you have been following along with any of the student projects, please drop me a note or leave a comment. I would love to hear more opinions from outside of the active SoC participants. Also, please thank the mentors and students for their work. Of course, they "know" you appreciate their effort -- but it really means something if you actually send them an e-mail or say thanks on irc.
For those just joining us, here is a run-down of the SoC projects and some links.
Go, Richard, go!
Richard Dice, president of the Perl Foundation, is part of an article on the "state of scripting languages" in CIO magazine.
Of all the scripting languages, Perl offers the biggest installed base of applications, of code, of integrated systems, of skilled programmers. It has the lowest defect rate of any open-source software product. It is ported to essentially every hardware architecture and operating systems, from embedded control systems to mainframes. It is optimized for speed, for memory footprint, for programmer productivity. It has readily-accessible libraries for all types of programming tasks: Web application development, systems and network integration and management, end-user application development, middleware programming, REST and service-oriented architecture programming. Perl is ideal for the organization that takes charge of its own IT future.