November 2011 Archives

Perlbuzz news roundup for 2011-11-28

| No Comments

These links are collected from the Perlbuzz Twitter feed. If you have suggestions for news bits, please mail me at andy@perlbuzz.com.

Finding a lost dog's owner with Perl and WWW::Mechanize

| 2 Comments

It's not every day you get to save a dog with Perl, but Perlbuzz reader Adam Gotch did just that the other day.

Adam tells me "I'm a telecommute Perl/Python contract programmer at O'Reilly Media. I live in Springboro, OH. I've been coding in Perl for about 10 years and love it."

On Saturday, Adam found a dog wandering the highway about a mile from his home. The local shelters didn't open until Monday, so he took it upon himself to try to find the owner himself.

Adam explains:

I located the Warren County dog registration website and discovered a simple web form that allowed you to look up an owner if you had the dog license # and registration year. Not having a clue what a license # looked like, I entered '1' with year '2011' and got a result. Dog license #'s were simple integers. Using binary search, I quickly discovered that there were 24996 registration records for 2011. The web form's search result provided a dog's owner's name, address and phone as well as the dog's breed, color and sex. With this knowledge I decided it was feasible to write a script to pull back all the records and filter for a female brown lab.

The dog registration website was ASP.NET with __VIEWSTATE and __EVENTVALIDATION post variables so a simple LWP script was going to be a pain. I had worked with WWW::Mechanize before so I checked the CPAN docs to see if it was going to work. It seemed to have everything I needed so I began coding. I wrote a quick test to see if I could pull back one record, but no luck. I ran wireshark captures of both a manual post in Chrome and my test script. Comparison of the captures revealed that the submit button name/value was not being sent by my script. Looking at the WWW::Mechanize docs, I found the button parameter to the submit_form() method for simulating a submit button click. It worked. I finished the script, looping over all 24996 records and soon I was pulling down all the Warren County dog registration records for 2011.

Here's the program Adam wrote:

use WWW::Mechanize;
use HTML::TreeBuilder::XPath;
use strict;

my $m = WWW::Mechanize->new();
$m->get('http://www.co.warren.oh.us/auditor/licensing/dog_search/');
my @info = ();

$| = 1;
for (my $i = 1; $i < 24997; $i++) {
    my $response;
    eval {
        $response = $m->submit_form( form_number => 1,
            fields => {
                'ctl00$ContentPlaceHolder1$txtlicense' => "$i",
                'ctl00$ContentPlaceHolder1$txtyear' => '2011'
            },
            button => 'ctl00$ContentPlaceHolder1$btnSubmit');
    };

    if (!$@ && $response->is_success) {
        my $tree = HTML::TreeBuilder::XPath->new;
        $tree->parse($response->decoded_content);

        # Use XPath selectors to find fields in the table
        my $owner_info = $tree->findvalue('//div/fieldset[1]/p');
        my $dog_info = $tree->findvalue('//div/fieldset[2]/p');
        push @info, [$owner_info, $dog_info, $i];
        print "$owner_info|$dog_info|$i\n";
    }
    else {
        warn "WARNING: POST FAILED";
    }
    $m->back();
}

After that, it was some simple calls to grep to filter the results:

cat warren_county_dogs.txt | \
    grep -i springboro | \
    grep -i lab | \
    grep -i brown | \
    grep -i female \
    > brown_labs.txt

This narrowed down the 25,000 records to 39. That made it easily to visually scan the list and find the addresses that were closest to where the dog was found. That narrowed it down to three. Adam Googled the phone numbers, found that one was a cell, and texted it.

I texted the first number, explaining I had found this dog on the highway and sure enough, it was the owner! He promptly drove to my house to pick up "Izzy". When he arrived he was very glad to have his dog back but also confused as to how I found his phone number. I told him I "scraped" the dog registration site and left it at that (yeah it's a bit unnerving how easy it is to find information on people).

Note that if Adam was using a system that didn't have grep or ack, he could have done the string matching in the Perl program before writing out to the file:

next unless $owner_info =~ /springboro/i;
next unless $dog_info =~ /lab/ && $dog_info =~ /brown/
    && $dog_info =~ /female/ && $dog_info =~ /lab/;

He could probably have done the matching with XPath as well, but I am very green on XPath. Such a modification is left as an exercise to the reader.

Thanks for the story, Adam!

Perlbuzz news roundup for 2011-11-21

| No Comments

These links are collected from the Perlbuzz Twitter feed. If you have suggestions for news bits, please mail me at andy@perlbuzz.com.

Perlbuzz news roundup for 2011-11-14

| No Comments

These links are collected from the Perlbuzz Twitter feed. If you have suggestions for news bits, please mail me at andy@perlbuzz.com.

Perlbuzz news roundup for 2011-11-07

| 2 Comments

These links are collected from the Perlbuzz Twitter feed. If you have suggestions for news bits, please mail me at andy@perlbuzz.com.

Mark Jason Dominus on giving fish

| 4 Comments

By Mark Jason Dominus, from a talk in 2003, reprinted here with permission. Sadly, it's still relevant today.

The #perl IRC channel has a big problem. People come in asking questions, say, "How do I remove the first character from a string?" And the answer they get from the regulars on the channel is something like "perldoc perlre".

This isn't particularly helpful, since perlre is a very large reference manual, and even I have trouble reading it. It's sort of like telling someone to read the Camel book when what they want to know is how to get the integer part of a number. Sure, the answer is in there somewhere, but it might take you a year to find it.

The channel regulars have this idiotic saying about how if you give a man a fish he can eat for one day, but if you teach him to fish, he can eat for his whole life. Apparently "perldoc perlre" is what passes for "teaching a man to fish" in this channel.

I'm more likely to just answer the question (you use $string =~ s/.//s) and someone once asked me why. I had to think about that a while. Two easy reasons are that it's helpful and kind, and if you're not in the channel to be helpful and kind, then what's the point of answering questions at all? It's also easy to give the answer, so why not? I've seen people write long treatises on why the querent should be looking in the manual instead of asking on-channel, which it would have been a lot shorter to just answer the question. That's a puzzle all right.

The channel regulars say that answering people's questions will make them dependent on you for assistance, which I think is bullshit. Apparently they're worried that the same people will come back and ask more and more and more questions. They seem to have forgotten that if that did happen (and I don't think it does) they could stop answering; problem solved.

The channel regulars also have this fantasy that saying perldoc perlre is somehow more helpful than simply answering the question, which I also think is bullshit. Something they apparently haven't figured out is that if you really want someone to look in the manual, saying perldoc perlre is not the way to do it. A much more effective way to get them to look in the manual is to answer the question first, and then, after they thank you, say "You could have found the answer to that in the such-and-so section of the manual." People are a lot more willing to take your advice once you have established that you are a helpful person. Saying perldoc perlre seems to me to be most effective as a way to get people to decide that Perl programmers are assholes and to quit Perl for some other language.

After I wrote the slides for this talk I found an old Usenet discussion in which I expressed many of the same views. One of the Usenet regulars went so far as to say that he didn't answer people's questions because he didn't want to insult their intelligence by suggesting that they would be unable to look in the documentation, and that if he came into a newsgroup with a question and received a straightforward answer to it, he would be offended. I told him that I thought if he really believed that he needed a vacation, because it was totally warped.

Mark Jason Dominus has been doing Perl forever. He is the author of Higher Order Perl which belongs on the shelf of every Perl programmer. Follow him on Twitter at @mjdominus.

« October 2011 | Main Index | Archives | December 2011 »