Finding a lost dog's owner with Perl and WWW::Mechanize

| 2 Comments

It's not every day you get to save a dog with Perl, but Perlbuzz reader Adam Gotch did just that the other day.

Adam tells me "I'm a telecommute Perl/Python contract programmer at O'Reilly Media. I live in Springboro, OH. I've been coding in Perl for about 10 years and love it."

On Saturday, Adam found a dog wandering the highway about a mile from his home. The local shelters didn't open until Monday, so he took it upon himself to try to find the owner himself.

Adam explains:

I located the Warren County dog registration website and discovered a simple web form that allowed you to look up an owner if you had the dog license # and registration year. Not having a clue what a license # looked like, I entered '1' with year '2011' and got a result. Dog license #'s were simple integers. Using binary search, I quickly discovered that there were 24996 registration records for 2011. The web form's search result provided a dog's owner's name, address and phone as well as the dog's breed, color and sex. With this knowledge I decided it was feasible to write a script to pull back all the records and filter for a female brown lab.

The dog registration website was ASP.NET with __VIEWSTATE and __EVENTVALIDATION post variables so a simple LWP script was going to be a pain. I had worked with WWW::Mechanize before so I checked the CPAN docs to see if it was going to work. It seemed to have everything I needed so I began coding. I wrote a quick test to see if I could pull back one record, but no luck. I ran wireshark captures of both a manual post in Chrome and my test script. Comparison of the captures revealed that the submit button name/value was not being sent by my script. Looking at the WWW::Mechanize docs, I found the button parameter to the submit_form() method for simulating a submit button click. It worked. I finished the script, looping over all 24996 records and soon I was pulling down all the Warren County dog registration records for 2011.

Here's the program Adam wrote:

use WWW::Mechanize;
use HTML::TreeBuilder::XPath;
use strict;

my $m = WWW::Mechanize->new();
$m->get('http://www.co.warren.oh.us/auditor/licensing/dog_search/');
my @info = ();

$| = 1;
for (my $i = 1; $i < 24997; $i++) {
    my $response;
    eval {
        $response = $m->submit_form( form_number => 1,
            fields => {
                'ctl00$ContentPlaceHolder1$txtlicense' => "$i",
                'ctl00$ContentPlaceHolder1$txtyear' => '2011'
            },
            button => 'ctl00$ContentPlaceHolder1$btnSubmit');
    };

    if (!$@ && $response->is_success) {
        my $tree = HTML::TreeBuilder::XPath->new;
        $tree->parse($response->decoded_content);

        # Use XPath selectors to find fields in the table
        my $owner_info = $tree->findvalue('//div/fieldset[1]/p');
        my $dog_info = $tree->findvalue('//div/fieldset[2]/p');
        push @info, [$owner_info, $dog_info, $i];
        print "$owner_info|$dog_info|$i\n";
    }
    else {
        warn "WARNING: POST FAILED";
    }
    $m->back();
}

After that, it was some simple calls to grep to filter the results:

cat warren_county_dogs.txt | \
    grep -i springboro | \
    grep -i lab | \
    grep -i brown | \
    grep -i female \
    > brown_labs.txt

This narrowed down the 25,000 records to 39. That made it easily to visually scan the list and find the addresses that were closest to where the dog was found. That narrowed it down to three. Adam Googled the phone numbers, found that one was a cell, and texted it.

I texted the first number, explaining I had found this dog on the highway and sure enough, it was the owner! He promptly drove to my house to pick up "Izzy". When he arrived he was very glad to have his dog back but also confused as to how I found his phone number. I told him I "scraped" the dog registration site and left it at that (yeah it's a bit unnerving how easy it is to find information on people).

Note that if Adam was using a system that didn't have grep or ack, he could have done the string matching in the Perl program before writing out to the file:

next unless $owner_info =~ /springboro/i;
next unless $dog_info =~ /lab/ && $dog_info =~ /brown/
    && $dog_info =~ /female/ && $dog_info =~ /lab/;

He could probably have done the matching with XPath as well, but I am very green on XPath. Such a modification is left as an exercise to the reader.

Thanks for the story, Adam!

2 Comments

Useless use of cat! (Ah, I miss Usenet...) Seriously, though, very cool story. Nice job of diagnosing the missing data that the server was expecting. You never know what odd headers or fields a server will demand.

Wow...glad the guy found the dog's owner, but, am I the only one with concerns about how Warren County handles private information?

Leave a comment

Job hunting for programmers


Land the Tech Job You Love, Andy Lester's guide to job hunting for programmers and other technical professionals, is available in PDF, ePub and .mobi formats, all DRM-free, as well as good old-fashioned paper.