Perlbuzz

Flash in the pan, Internet scale

Oct 8, 2008

The lovely & talented Ricardo SIGNES on his employer:

I work for Pobox. We provide identity management. For the most part, it's about email. You register an email address with us and we handle the mail for you. We send it to an IMAP store, or your current ISP, or some flash in the pan webmail provider like Google.
What's the state of Perl web frameworks?

Oct 5, 2008 • Perl 5

Joshua Hoblitt pounced on me in AIM this morning as soon as I opened my laptop.

Joshua Hoblitt: Here's something to put on Perlbuzz.

JH: WTF MVC framework is working this week?

Andy Lester: Sounds like an editorial in the making?

JH: Maypole is dead, Catalyst is um, well, I've never managed to finish a project with it.

JH: The documentation is SHIIITTT.

JH: And the book is one of the most crapped-on books I've ever seen on Amazon.

JH: So Catalyst is a no go for me.

JH: So what's left? Roll your own with Mason?

AL: CGI::Application?

JH: Ya, I've used it for small stuff.

JH: The kind of stuff you put in one monster .pm file so it's trivial to install.

JH: Hmm, there's MasonX::MiniMVC.

JH: And this egg thing.

AL: Can I post this chat as an article?

JH: Please do.

I've gone through a similar thought process recently. I've started looking at CGI::Application, but the work project where I was starting to use it has been derailed for a weeks.

I welcome your ideas on the state of frameworks, either in comments below, or as a guest editorial.
Is learning Perl the hard way the easy way?

Oct 5, 2008 • Advocacy

Bruce Momjian, guru of PostgreSQL, has discovered the joys of Perl.

I have converted two of my most complex shell scripts to Perl; as shell scripts, they were slow and hard to maintain. The rewritten Perl scripts are 200-400 lines long (about the same length the original shell scripts) and 15-25 times faster because of the improved algorithms possible in Perl and reduced subprocess creation.

What was surprising to me was how he'd learned, via a book I'd never heard of before, Learning Perl The Hard Way. Has anyone in the Perl Buzz readership read it? Comments?
Hidden features of Perl

Oct 2, 2008 • Perl 5

There's an interesting little thread at Stack Overflow on Hidden features of Perl. Go on over and add your favorites.
Developer optimization redux

Oct 1, 2008 • Community
Users are crucial to open source projects. Without them we have no reason to release publicly, and without refreshing the ranks of developers with users who join the fold, our projects die. Users are our customers, and we can't afford to treat them poorly. When a user wants to go the extra mile to help us as developers, turning him or her away is a grave misstep.

Here's an example. Andrea discovers a problem in PHP's database handling, where calling a certain function incorrectly causes a segfault. The bug isn't a work-stopper for her, and the fix is simple: Call the function correctly. Still, it's a segfault, and she figures the PHP folks will want to know about it. It also doesn't help her confidence in the tool that calling a function incorrectly segfaults. Being a good open source citizen, she decides to report the bug.

She's already spent the time figuring out the problem, and she reduces the code to a single, repeatable example, that shows exactly how to make the code segfault. "This should help them track it down," she thinks. She's spent an hour on this detour in the middle of a project for work, but knows that open source relies on bug reports to get things fixed.

She dutifully checks bugs.php.net, and finds nothing that matches, so she goes to submit the bug. Unfortunately, the PHP site will only accept bugs against 5.2.6, instead of 5.2.5 that she is running. This leaves her with three choices:
- Upgrade to 5.2.6 on a test machine, and test out her problem. She knows not to upgrade a production box so cavalierly.
- Find someone using a similar install to see if that person will test it for her.
- Submit the bug against 5.2.6, effectively lying but not spending any more of her time.
- Throw up her hands and say "Screw it, I've got work to do."
That's what happened to me, "Andrea", the other day. I wrote about it in a frothier Perlbuzz article the other day. I wish that my frustrations with PHP hadn't overshadowed my point about community building, so I'm trying again here.

What about the users?

My frustration in PHP's approach, and they're certainly not the only community to do this, is that the emphasis is in optimizing the time of the PHP developer who has to deal with bugs. "Who wants to deal with bugs that have already been fixed?" goes the logic. I imagine someone setting up the PHP bug database saying "We need to put something up to make sure that we don't get annoyed by bugs that have already been fixed." I can understand that motivation. As someone who answers questions in #perl about WWW::Mechanize all day, I can certainly empathize with not wanting to deal with pointless comments.

And yet...

Nowhere do I see any discussion of how the user sees the interaction. I doubt anyone considered the reaction of the user who is told "Sorry, you're not able to submit your bug report that you worked to get together to send to us." Instead, debate about the original article is from the point of view of the beleaguered developer, having to deal with those darn users, contributing their bug fixes.

Yes, I understand that plenty of people submit bugs that aren't bugs, or that have already been fixed. Perl's bug reporting system is wide open, and I've closed my share of tickets in RT that weren't really bugs. But I'm OK with that.

How long does it take to close tickets that aren't right? Compare that cost to the cost of losing a valid bug report. Or worse, alienating a potential friend of your project.

In everything we do when working on projects, we need to remember there are real users, real people at the other end that are the core of what we do.
Writing a crawler with WWW::Mechanize

Sep 19, 2008 • Code craft

Stefan Petrea has written up a summary of his building of an MP3 website crawler using WWW::Mechanize and an RDBMS. It's a good write-up, and good overview of the issues of crawling beyond the obvious "open a page, get the links, follow the links.".
Index of online articles from The Perl Journal

Sep 17, 2008 • Perl 5

brian d foy has put together an index of articles published in The Perl Journal from 2003 to 2006, all available on the web.

A lot of the articles are out of date, of course (nobody needs my 2004 OSCON roundup, do they?), but others like Simon Cozens' Ten Things You (Probably) Didn't Know About Perl still stand up today.

Check out the list, and please post back here about the most useful article you found.
Announcing “The Working Geek”

Sep 17, 2008

I started a new blog a while ago called The Working Geek, devoted to work issues of interest to techies of all stripes who work for a living. Topics have included how to speak Manager, personal networking and closing the deal at a job interview.

I'm going to be posting much more as I approach completion of my book Land the Tech Job You Love: Why Skill and Luck Are Not Enough to be published by Pragmatic Bookshelf. Bits of the book will probably make their way into blog posts, and I'll be posting more about work and tech articles that I see.

I hope to see you there. For convenience and so you don't have to leave the comfort of your newsreader, here's the feed.
Perl best administration practices

Sep 10, 2008 • Perl 5

Michael Schwern has started a fantastic page on the Perl 5 Wiki onbest practices for keeping your Perl installation sane and happy.

Some high-level excerpts:

First and foremost thing I can say is if you depend heavily on Perl (or any single piece of technology) build it yourself.... Second, if you do build your own Perl, leave the system Perl alone..... Third, isolate your perl installs so you can have many installed in parallel....

Well-written information from someone who knows, and since it's a wiki, you can help add to it as well.
Perl 5.8.9's perldelta needs your help

Sep 8, 2008 • Perl 5

By Paul Fenwick

Perl 5.8.9 is just around the corner. Incorporating over two and a half years of bugfixes and improvements, it will be the best release of Perl 5.8 ever. Unfortunately, we have a problem; right now there's no easy way for the average developer to know what's changed.
Every version of Perl ships with a perldelta file, which summarises all the important changes into a single document for anyone who wants to know what's new. This document needs to be written before 5.8.9 can ship, and it's a big task. Luckily, it's also a task that can be distributed, and we need your help.
The work has been split into individual months of changelog that need to be summarised. You can volunteer for as little or as much work as you like. Even if you don't know much about Perl's internals, you can volunteer for a "light" approach where you summarise easy and obvious changes, like upgraded modules, or easily-understood bugfixes.
Contributing the perl589delta directly helps with the release of Perl 5.8.9. However you'll also get a mention in the prestigious Perl authors file, kudos on ohloh, and enough material to write a "What's new in 5.8.9" lightning talk that will make you a star at conferences and user-groups.
To get started, join the mailing list. If you're happy to dive into work right away, and we hope that you are, then follow the instructions in the README at the bottom of our source control page.
Don't worry if you don't think you can handle a whole month of changes at once. Don't worry if you don't know your way around the Perl guts. If you want to start small, you can use the micro helpers HOWTO which describes how you can contribute with a minimum of fuss.
Even if you're not sure how to help, we'd still love to see you in the group; there's plenty of things that need doing. With your help, we can make Perl 5.8.9 a reality!

Paul Fenwick is the managing director of Perl Training Australia. He is the author of Perl's new autodie pragma, an internationally recognised conference speaker, and author of many editions of Perl Tips. His interests include coffee, mycology, scuba diving, applied statistics, and lexically scoped user pragmata.
The relation between CPAN Testers and quality, or, Why CPAN Testers sucks if you don't need it

Sep 4, 2008 • CPAN, Perl 5

by David Golden

There have been some mega-email threads about CPAN Testers on the perl-qa mailing list that started with a question about the use of exit 0 in Makefile.PL.

I want to sum up a few things that I took away from the conversations and propose a series of major changes to CPAN Testers. Special thanks to an off-list (and very civil) conversation with chromatic for triggering some of these thoughts.

Type I and Type II errors

In statistics, a Type I error means a "false positive" or "false alarm". For CPAN Testers, that's a bogus FAIL report. A Type II error means a "false negative", e.g. a bogus PASS report. Often, there is a trade-off between these. If you think about spam filtering as an example, reducing the chance of spam getting through the filter (false negatives) tends to increase the odds that legitimate mail gets flagged as spam (false positives).

Generally, those involved in CPAN Testers have taken the view that it's better to have a false positives (false alarms) than false negatives (a bogus PASS report). Moreover, we've tended to believe -- without any real analysis -- that the false positive *ratio* (false FAILs divided by all FAILs) is low.

But I've never heard a single complaint about a bogus PASS report and I hear a lot of complaints about bogus FAILS, so it's reasonable to think that we've got the tradeoff wrong. Moreover, I think the downside to false positives is actually higher than for false negatives if we believe that CPAN Testers is primarily a tool to help authors improve quality rather than a tool to give users a guarantee about how distributions work on any given platform.

False positive ratios by author

Even if the aggregate false positive ratio is low, individual CPAN authors can experience extraordinarily high false positive ratios. What I suddenly realized is that the higher the quality of an author's distributions, the higher the false positive ratio.

Consider a "low quality" author -- one who is prone to portability errors, missing dependencies and so on. Most of the FAIL reports are legitimate problems with the distribution.

Now consider a "high quality" author -- one who is careful to write portable code, well-specified dependencies and so on. For this author, most of the FAIL reports only come when a tester has a broken or misconfigured toolchain The false positive ratio will approach 100%.

In other words, the *reward* that CPAN Testers has for high quality is increased annoyance from false FAIL reports with little benefit.

Repetition is desensitizing

From a statistical perspective, having lots of CPAN Testers reports for a distribution even on a common platform helps improve confidence in the aggregate result. Put differently, it helps weed out "outlier" reports from a tester who happens to have a broken toolchain.

However, from author's perspective, if a report is legitimate (and assuming they care), they really only need to hear it once. Having more and more testers sending the same FAIL report on platform X is overkill and gives yet more encouragement for authors to tune out.
So the more successful CPAN Testers is in attracting new testers, the more duplicate FAIL reports authors are likely to receive, which makes them less likely to pay attention to them.

When is a FAIL not a FAIL?

There are legitimate reasons that distributions could be broken such that they fail during PL or make in ways that are not the fault of the tester's toolchain, so it still seems like valuable information to know when distributions can't build as well as when they don't pass tests. So we should report on this and not just skip reporting. On the other hand, most of the false positives that provoke complaint are toolchain issues during PL or make/Build.

Right now there is no easy way to distinguish the phase of a FAIL report from the subject of an email. Removing PL and make/Build failures from the FAIL category would immediately eliminate a major source of false positives in the FAIL category and decrease the aggregate false positive ratio in the FAIL category. Though, as I've shown, while this may decrease the incidence of false positives for high quality authors, the false positive ratio is likely to remain high.

It almost doesn't matter whether we reclassify these as UNKNOWN or invent new grades. Either way partitions the FAIL space in a way that makes it easier for authors to focus on which ever part of the PL/make/test cycle they care about.

What we can fix now and what we can't

Some of these issues can be addressed fairly quickly.

First, we can lower our collective tolerance of false positives -- for example, stop telling authors to just ignore bogus reports if they don't like it and find ways to filter them. We have several places to do this -- just in the last day we've confirmed that the latest CPANPLUS dev version doesn't generate Makefile.PL's and some testers have upgraded. BinGOs has just put out CPANPLUS::YACSmoke 0.04 that filters out these cases anyway if testers aren't on the bleeding edge of CPANPLUS. We now need to push testers to upgrade. As we find new false positives, we need to find new ways to detect and suppress them.

Second, we can reclassify PL/make/Build fails to UNKNOWN. This won't break any of the existing reporting infrastructure the way that adding new grades would. I can make this change in CPAN::Reporter in a matter of minutes and it probably wouldn't be hard to do the same in CPANPLUS. Then we need another round of pushing testers to upgrade their tools. We could also take a decision as to whether UNKNOWN reports should be copied to authors by default or just sent to the mailing list.

However, as long as the CPAN Testers system has individual testers emailing authors, there is little we can do to address the problem of repetition. One option is to remove that feature from Test::Reporter and reports will only go to the central list. With the introduction of an RSS feed (even if not yet optimal), authors will have a way to monitor reports. And from that central source, work can be done to identify duplicative reports and start screening them out of notifications.

Once that is more or less reliable, we could restart email notifications from that central source if people felt that nagging is critical to improve quality. Personally, I'm coming around to the idea that it's not the right way to go culturally for the community. We should encourage people to use these tools, sign up for RSS or email alerts, whatever, because they think that quality is important. If the current nagging approach is alienating significant numbers of perl-qa members, how can we possibly expect that it's having a positive influence on everyone else?

Some of these proposal would be easier in CPAN Testers 2.0, which will provide reports as structured data instead of email text, but if "exit 0" is a straw that is breaking the Perl camel's back now, then we can't ignore 1.0 to work on 2.0 as I'm not sure anyone will care anymore by the time it's done.

What we can't do easily is get the testers community to upgrade to newer versions of the tools. That is still going to be a matter of announcements and proselytizing and so on. But I think we can make a good case for it, and if we can get the top 10 or so testers to upgrade across all their testing machines then I think we'll make a huge dent in the false positives that are undermining support for CPAN Testers as a tool for Perl software quality.

I'm interested in feedback on these ideas. In particular, I'm now convinced that the "success" of CPAN Testers now prompts the need to move PL/make fails to UNKNOWN and to discontinue copying authors by individual testers. I'm open to counter-arguments, but they'll need to convince me of a better long-run solution to the problems I identified.

David Golden is a CPAN Tester and prolific CPAN author with over two dozen modules released, including the groundbreaking CPAN::Reporter and Class::InsideOut. David was the release engineer for the alpha versions of Strawberry Perl. He has been a speaker at YAPC::NA, The New York Perl Seminar, and Boston.pm and has written articles for The Perl Review. David lives in New York City.
Great results for five Perl projects in Google Summer of Code 2008

Aug 30, 2008

By Eric Wilhelm, Perl coordinator for Google Summer of Code 2008

Google's Summer of Code 2008 is wrapping up now and I'm very pleased with how well The Perl Foundation's students and mentors have done. The five projects which survived the halfway point have all finished with great results.

Many thanks to all of the mentors and students as well as everyone in the community who helped or supported the process. Also, thanks to Google for putting on the program and to Richard Dice and Jim Brandt at TPF.
But the end is only the beginning. We should really get started on next year now. Perl needs to do a better job of attracting students, but I'll have to address these issues in another post.
Most of the students did a great job of blogging their progress, which I think is an important part of Summer of Code for the rest of the community. If you have been following along with any of the student projects, please drop me a note or leave a comment. I would love to hear more opinions from outside of the active SoC participants. Also, please thank the mentors and students for their work. Of course, they "know" you appreciate their effort -- but it really means something if you actually send them an e-mail or say thanks on irc.
For those just joining us, here is a run-down of the SoC projects and some links.
Flesh out the Perl 6 Test Suite
student: Adrian Kreher
mentor: Moritz Lenz
Blog | Code
wxCPANPLUS
student: Samuel Tyler
mentors:
Herbert Breunung
Jos Boumans
Blog | Code | CPAN distribution
Native Call Interface Signatures and Stubs Generation for Parrot student: Kevin Tew
mentor: Jerry Gay
Mail | Code | (older branch)
Incremental Tricolor Garbage Collector
student: Andrew Whitworth
mentor: chromatic
Blog | Code
Math::GSL
student: Thierry Moisan
mentor: Jonathan Leto
Blog | Code | CPAN distribution
Eric Wilhelm is a software and systems consultant, leader of the Portland Perl Mongers, and author of many CPAN modules.
Richard Dice trumpets Perl to the press

Aug 29, 2008 • Advocacy, Perl Foundation

Go, Richard, go!

Richard Dice, president of the Perl Foundation, is part of an article on the "state of scripting languages" in CIO magazine.

Of all the scripting languages, Perl offers the biggest installed base of applications, of code, of integrated systems, of skilled programmers. It has the lowest defect rate of any open-source software product. It is ported to essentially every hardware architecture and operating systems, from embedded control systems to mainframes. It is optimized for speed, for memory footprint, for programmer productivity. It has readily-accessible libraries for all types of programming tasks: Web application development, systems and network integration and management, end-user application development, middleware programming, REST and service-oriented architecture programming. Perl is ideal for the organization that takes charge of its own IT future.

Read on →
Downloading video with Awk

Aug 29, 2008 • Data munging

Peteris Krumins, the prolific blogger and programmer, decided to explore TCP/IP networking in GNU Awk, and came up with this, a YouTube video downloader.

Subscribe to Peteris' blog. It's well worth reading.
Perl 6 apps today: November is a wiki written in Perl 6

Aug 28, 2008 • Perl 6

Adapted from Patrick Michaud

Carl Mäsak and Johan Viklund have recently released November, a wiki engine written in Perl 6 for Rakudo Perl, the Perl 6 implementation written for the Parrot virtual machine..

Details are available at "Announcing November, a wiki in Perl 6", with an important followup post at "November meets the Web".

Great work, and I really enjoyed the lightning talk!
Red Hat's patch slows down overloading in Perl

Aug 25, 2008 • Perl 5 • Vipul Ved Prakash, Redhat, Pete Krawczyk
Vipul Ved Prakash, long-time CPAN author and creator of Vipul's Razor, has reported a big slowdown in Red Hat's Perl package.

Some investigation revealed that there’s a long standing bug in Redhat Perl that causes *severe* performance degradation on code that uses the bless/overload combo. The thread on this is here: https://bugzilla.redhat.com/show_bug.cgi?id=379791.

Vipul's analysis is a beautiful rundown of how these kinds of things should be reported, and the techie details should help you decide whether you want to rebuild Perl from source, or wait for updated packages for RHEL and Fedora.

Pete Krawczyk sent me a few comments:

RedHat acknowledges that their patching of Perl caused slowness; if you're doing serious work with default Perl on RedHat, you might want to consider building your own until a proper patch comes along. The problem currently affects Fedora 9, RedHat 5 and spin-offs like CentOS 5. The main symptom is exponential slowdown during operations involving overloaded operators; many common modules (such as JSON and URI) are also affected.

Some other links:
- Perl community on LiveJournal
- Comments on Reddit
And here's my code to illustrate the slowdown, based on the original code in Vipul's article:
```
#!/usr/bin/perl
use Time::HiRes;
use overload q( sub {};
my %h;
$|++;
print "Pass#tPass timetTotal timen";
my $bigstart = Time::HiRes::time();
for my $i ( 1..50 ) {
my $start = Time::HiRes::time();
for my $j ( 1..1000 ) {
$h{$i*1000 + $j} = bless [ ] => 'main';
}
my $now = Time::HiRes::time();
printf( "#%2dt%ft%fn", $i, $now-$start, $now-$bigstart );
}
```
How cool Perl 6 really is

Aug 21, 2008 • Perl 6

Moritz Lenz has started a series of blog posts about moving from Perl 5 to Perl 6, including why some design choices were made, and how you can take advantage of some of the Perl 6 features today in Perl 5. "The target audience are Perl 5 programmers. It is build like a tutorial, but strongly emphasizes the 'why'," he says in his use.perl posting.
Big interview with Damian Conway

Aug 21, 2008 • Interviews

O'Reilly interviewed Damian Conway at OSCON. There's surprisingly little craziness, but lots of good discussion of programming languages, programming curricula and of course, Perl 6. Oh, and a fair amount of mocking of American accents. Laugh it up, Mr. I-Live-On-A-Giant-Penal-Colony-Island!

The O'Reilly page has a transcription if you don't want to devote 36 minutes of your life to it, but why wouldn't you?
BarCamp Milwaukee 3 coming October 4th-5th

Aug 12, 2008 • Community, Conferences

Pete Prodoehl has just announce the third BarCamp Milwaukee, October 4th and 5th, 2008. Pete says

It's a gathering of tech enthusiasts from the Wisconsin area, Milwaukee, Madison, Appleton, even Chicago. There will be sessions on web-related stuff, non-profits, co-working, and plenty of other topics. Plus, you get a free t-shirt, and we feed you! But you must participate, as there are no spectators at BarCamp. There's a good mix of presentations, discussions, working sessions and late night hacking, as well as media making, photos and video. In prior years we've had remote-control go-karts, videoblogging, gadgets, RSS and elevator hacking.

There's a video from last year's BarCamp. I hope to see you all there.
Creating Excel files with Perl

Aug 11, 2008 • CPAN, Data munging

Linux Journal has an article on creating Excel files using Spreadsheet::WriteExcel. It has its quirks, like creating corrupted spreadsheets if you try to populate a cell more than once, but when you need it, there's nothing else to do what it does.

« Older Newer »

What about the users?

Type I and Type II errors

False positive ratios by author

Repetition is desensitizing

When is a FAIL not a FAIL?

What we can fix now and what we can't