Perlbuzz news roundup for 2011-11-28

These links are collected from the
Perlbuzz Twitter feed.
If you have suggestions for news bits, please mail me at

Finding a lost dog’s owner with Perl and WWW::Mechanize

It’s not every day you get to save a dog with Perl, but Perlbuzz
reader Adam Gotch did just that the other day.

Adam tells me “I’m a telecommute Perl/Python contract programmer
at O’Reilly Media. I live in Springboro, OH. I’ve been coding in
Perl for about 10 years and love it.”

On Saturday, Adam found a dog wandering the highway about a mile
from his home. The local shelters didn’t open until Monday,
so he took it upon himself to try to find the owner himself.

Adam explains:

I located the Warren County dog registration

and discovered a simple web form that allowed you to look up an
owner if you had the dog license # and registration year. Not having
a clue what a license # looked like, I entered ‘1’ with year ‘2011’
and got a result. Dog license #’s were simple integers. Using binary
search, I quickly discovered that there were 24996 registration
records for 2011. The web form’s search result provided a dog’s
owner’s name, address and phone as well as the dog’s breed, color
and sex. With this knowledge I decided it was feasible to write a
script to pull back all the records and filter for a female brown

The dog registration website was ASP.NET with __VIEWSTATE and
__EVENTVALIDATION post variables so a simple LWP script was going
to be a pain. I had worked with
WWW::Mechanize before
so I checked the CPAN docs to see if it was going to work. It seemed
to have everything I needed so I began coding. I wrote a quick test
to see if I could pull back one record, but no luck. I ran
captures of both a manual post in Chrome and my test script.
Comparison of the captures revealed that the submit button name/value
was not being sent by my script. Looking at the WWW::Mechanize docs,
I found the button parameter to the submit_form() method for
simulating a submit button click. It worked. I finished the script,
looping over all 24996 records and soon I was pulling down all the
Warren County dog registration records for 2011.

Here’s the program Adam wrote:

use WWW::Mechanize;
use HTML::TreeBuilder::XPath;
use strict;
my $m = WWW::Mechanize->new();
my @info = ();
$| = 1;
for (my $i = 1; $i < 24997; $i++) {
my $response;
eval {
$response = $m->submit_form( form_number => 1,
fields => {
'ctl00$ContentPlaceHolder1$txtlicense' => "$i",
'ctl00$ContentPlaceHolder1$txtyear' => '2011'
button => 'ctl00$ContentPlaceHolder1$btnSubmit');
if (!$@ && $response->is_success) {
my $tree = HTML::TreeBuilder::XPath->new;
# Use XPath selectors to find fields in the table
my $owner_info = $tree->findvalue('//div/fieldset[1]/p');
my $dog_info = $tree->findvalue('//div/fieldset[2]/p');
push @info, [$owner_info, $dog_info, $i];
print "$owner_info|$dog_info|$in";
else {

After that, it was some simple calls to grep to filter the results:

cat warren_county_dogs.txt | 
grep -i springboro | 
grep -i lab | 
grep -i brown | 
grep -i female 
> brown_labs.txt

This narrowed down the 25,000 records to 39. That made it easily
to visually scan the list and find the addresses that were closest
to where the dog was found. That narrowed it down to three. Adam
Googled the phone numbers, found that one was a cell, and texted

I texted the first number, explaining I had found this dog on the
highway and sure enough, it was the owner! He promptly drove to my
house to pick up “Izzy”. When he arrived he was very glad to have
his dog back but also confused as to how I found his phone number.
I told him I “scraped” the dog registration site and left it at
that (yeah it’s a bit unnerving how easy it is to find information
on people).

Note that if Adam was using a system that didn’t have grep or
ack, he could have done the string
matching in the Perl program before writing out to the file:

next unless $owner_info =~ /springboro/i;
next unless $dog_info =~ /lab/ && $dog_info =~ /brown/
&& $dog_info =~ /female/ && $dog_info =~ /lab/;

He could probably have done the matching with XPath as well, but I
am very green on XPath. Such a modification is left as an exercise
to the reader.

Thanks for the story, Adam!

Perlbuzz news roundup for 2011-11-21

These links are collected from the
Perlbuzz Twitter feed.
If you have suggestions for news bits, please mail me at

Perlbuzz news roundup for 2011-11-14

These links are collected from the
Perlbuzz Twitter feed.
If you have suggestions for news bits, please mail me at

Perlbuzz news roundup for 2011-11-07

These links are collected from the
Perlbuzz Twitter feed.
If you have suggestions for news bits, please mail me at

Mark Jason Dominus on giving fish

By Mark Jason Dominus, from a talk in 2003, reprinted here with permission. Sadly, it’s still relevant today.

The #perl IRC channel has a big problem. People come in asking questions, say, “How do I remove the first character from a string?” And the answer they get from the regulars on the channel is something like “perldoc perlre“.

This isn’t particularly helpful, since perlre is a very large reference manual, and even I have trouble reading it. It’s sort of like telling someone to read the Camel book when what they want to know is how to get the integer part of a number. Sure, the answer is in there somewhere, but it might take you a year to find it.

The channel regulars have this idiotic saying about how if you give a man a fish he can eat for one day, but if you teach him to fish, he can eat for his whole life. Apparently “perldoc perlre” is what passes for “teaching a man to fish” in this channel.

I’m more likely to just answer the question (you use $string =~ s/.//s) and someone once asked me why. I had to think about that a while. Two easy reasons are that it’s helpful and kind, and if you’re not in the channel to be helpful and kind, then what’s the point of answering questions at all? It’s also easy to give the answer, so why not? I’ve seen people write long treatises on why the querent should be looking in the manual instead of asking on-channel, which it would have been a lot shorter to just answer the question. That’s a puzzle all right.

The channel regulars say that answering people’s questions will make them dependent on you for assistance, which I think is bullshit. Apparently they’re worried that the same people will come back and ask more and more and more questions. They seem to have forgotten that if that did happen (and I don’t think it does) they could stop answering; problem solved.

The channel regulars also have this fantasy that saying perldoc perlre is somehow more helpful than simply answering the question, which I also think is bullshit. Something they apparently haven’t figured out is that if you really want someone to look in the manual, saying perldoc perlre is not the way to do it. A much more effective way to get them to look in the manual is to answer the question first, and then, after they thank you, say “You could have found the answer to that in the such-and-so section of the manual.” People are a lot more willing to take your advice once you have established that you are a helpful person. Saying perldoc perlre seems to me to be most effective as a way to get people to decide that Perl programmers are assholes and to quit Perl for some other language.

After I wrote the slides for this talk I found an old Usenet discussion in which I expressed many of the same views. One of the Usenet regulars went so far as to say that he didn’t answer people’s questions because he didn’t want to insult their intelligence by suggesting that they would be unable to look in the documentation, and that if he came into a newsgroup with a question and received a straightforward answer to it, he would be offended. I told him that I thought if he really believed that he needed a vacation, because it was totally warped.

Mark Jason Dominus has been doing Perl forever. He is the author of Higher Order Perl which belongs on the shelf of every Perl programmer. Follow him on Twitter at @mjdominus.