Andy Lester: January 2008 Archives

Oh no, hourly smoke test failures in my inbox today! Looks like I put some bad HTML on work's website's home page, and every hour one of the automated tests, via Test::HTML::Lint, told me once an hour that

# HTML::Lint errors for http://devserver.example.com/
#  (72:53) <a> at (61:53) is never closed

Well, phooey, it's a PHP-driven website, so I can't open index.php and check lines 72 and 61. I can use GET, installed with LWP, to fetch the website and save the source:

$ GET http://devserver.example.com/ > foo.html
$ vim foo.html +61

but since I'm a Perl programmer, I want to be as lazy as possible by using the tools at my disposal. In this case, it's ack, and ack has the --line option to display ranges of lines instead of results of a regex. (Thanks to Torsten Blix for implementing this!)

$ GET http://devserver.example.com/ | ack --lines=61-72

So much nicer that way! and look at it in an editor, but how much easier to not have to do that.

There is no project so small, so trivial, that it is not worth you putting it into a Subversion repository. If it's worth your time to work on it, it's worth saving. Putting it in Subversion is a matter of a few statements, and you don't have to do any big fancy-shmancy server setup.

Let's assume you're working on Linux/Unix, and you have svn installed, which is pretty standard these days. Say you're working on a game called bongo, and you've just been keeping it in ~/bongo. Do this:

# Create the Subversion repo
$ mkdir /svn

# Create the bongo repo
$ svnadmin create /svn/bongo

# Import bongo into its project
$ cd ~/bongo
$ svn import file:///svn/bongo -m'First import of bongo into Subversion'

# Move the original bongo directory out of the way,
# in case something goes wrong
$ mv ~/bongo ~/bongo-original

# Check out bongo from Subversion
svn co file:///svn/bongo

At this point, you'll have a checked-out version of bongo in ~/bongo, and you can make commits against it.

Ricardo Signes points out that Git makes it even easier.

# Go to the bongo directory
$ cd ~/bongo

# Import bongo
$ git init

With Git, everything is put in your ~/.git directory, and you don't have to check out anything from the project.

Whatever route you choose, version control is so simple these days there's just no excuse not to do it. Your programming life will never be the same.

Inspired by a post in the Beautiful Code blog, Simon Wistow created Acme::Numbers, which lets you do cleverness like this:

use Acme::Numbers;

print two.hundred."\n";               # prints 200
print forty.two."\n";                 # prints 42
print zero.point.zero.five."\n";      # prints 0.05
print four.pounds.fifty.five."\n";    # prints "4.55"
print four.pounds.fifty.pence."\n";   # prints "4.50"
print four.dollars.fifty.cents."\n";  # prints "4.55"

You probably wouldn't want to do this in production code, but like the best of Damian Conway's not-useful-but-thought-provoking modules, it may spark some ideas that you can apply to more useful situations. If nothing else, the source is a fine lesson in overloading and method importing.

Vim tricks for Perl

| | Comments (1)

This thread on use.perl.org points to some cool vim support for Perl. I'm not sure I like all the doodads in perl-support.vim, but I did add this to my .vimrc:

autocmd FileType perl :noremap K :!perldoc <cword>
\ <bar><bar> perldoc -f <cword><cr>

Now hitting K in vim runs perldoc or perldoc -f on the word under the cursor.

The man who couldn't refactor

| | Comments (1)

For the past few months, I've been slogging through some PHP code written by a solo programmer with no real oversight from other programmers. His code is a monoculture.

I found this bit of code today that just sums up his unwillingness, or perhaps inability, to refactor.

if (substr($libtest,0,12) == "FOO COUNTY")
    $foocounty=$getmultiples=1;
if ((substr($libtest,0,12) == "FOO COUNTY") && ($state=="TX")) {
    $foocounty=$getmultiples=0;
    $foocountytx=$getmultiples=1;
}

Just read those lines of code and you can recreate the crime in your head. First there was a customer in Foo County. Then, we had to handle a different Foo County, but this Foo County was in Texas. He couldn't even be bothered to change the initial test to be more specific, or to modify the existing code. His solution was the simplest thing that could possibly work, and was also the worst: Reversing the effect of the first check for Foo County. There's also no checking for non-Texas, non-original Foo County, but when I checked I found that we have customers that are in Foo County in THREE different states.

The programmer no longer works for us, so I'm unable to ask him about his motivations. I'm fascinated by the mindset that is unable to do the barest rework necessary.

Make your own mini CPAN

| | Comments (0)

Ricardo Signes' marvelous module CPAN::Mini just got an update today, and it reminds me to tell you all how great it is to be able to have a small version of the CPAN on your local hard drive, especially on a laptop. The included minicpan program makes it trivial to update your local archive.

First, I make a little ~/.minicpanrc that looks like this:

local: ~/minicpan/
remote: http://cpan.pair.com/pub/CPAN/
also_mirror: indices/ls-lR.gz

And then I run minicpan every so often. This pulls in the latest version of each distribution, and deletes ones that are obsoleted by newer versions. When I run minicpan, it looks like this:

$ minicpan
authors/01mailrc.txt.gz ... updated
modules/02packages.details.txt.gz ... updated
modules/03modlist.data.gz ... updated
mkdir /tmp/Woq_DHsWsN/indices
indices/ls-lR.gz ... updated
indices/ls-lR.gz ... updated
mkdir /home/andy/minicpan/authors/id/G/GR/GROMMIER
authors/id/G/GR/GROMMIER/Text-Editor-Easy-0.01.tar.gz ... updated
authors/id/G/GR/GROMMIER/CHECKSUMS ... updated
mkdir /home/andy/minicpan/authors/id/Z/ZO/ZOFFIX
authors/id/Z/ZO/ZOFFIX/Acme-BabyEater-0.01.tar.gz ... updated
authors/id/Z/ZO/ZOFFIX/CHECKSUMS ... updated
...
cleaning /home/andy/minicpan/authors/id/L/LU/LUKEROSS/DBIx-StORM-0.04.tar.gz ...done
cleaning /home/andy/minicpan/authors/id/L/LU/LUKEC/Test-WWW-Selenium-1.13.tar.gz ...done
cleaning /home/andy/minicpan/authors/id/L/LU/LUKEC/mocked-0.07.tar.gz ...done
cleaning /home/andy/minicpan/authors/id/L/LO/LOCATION/Geo-IP2Location-2.00.tar.gz ...done
cleaning /home/andy/minicpan/authors/id/L/LO/LODIN/Regexp-Exhaustive-0.03.tar.gz ...done

Currently the repository is only 846M of disk space. Who doesn't have an extra gig lying around these days?

uniqua:~/minicpan $ du -sh
846M	.

I also point my CPAN shell configuration to use the mini CPAN as its source of modules by prepending file:///home/andy/minicpan to the list of URLs it checks.

Thanks to Ricardo for putting out this great tool, and Randal Schwartz for the original column on which it was based.

Rafael Garcia-Suarez has posted his modifications for vim to support Perl 5.10. His are a good deal more complete than the simple modification I posted last month that just covers the say keyword.

Max Kanat-Alexander has a new blog up called Code Simplicity, and I'd love it for the name alone. His latest post, "Designing Too Far Into The Future", talks about the perils of trying to predict the future and guess what your code will have to do down the road. In the XP world, the term that gets thrown around is YAGNI, for "Ya Ain't Gonna Need It." When you have to write a report and you start by writing a report generator, that's a big violation of the principle of YAGNI.

I see this missed so many times I have to bring it up here: "If you have a database column that contains only digits, but will not perform calculations on it, make it a character column."

You CAN store a 10-digit phone number as an integer, but why would you want to? You CAN store a Social Security Number as a 9-digit number, but why would you want to? Surely you're not so concerned of a few bytes savings. Storing an SSN of "0123456789" as a number means you use the leading zero, too, so you lose fidelity of data. Any string of digits follows this rule. You don't perform calculations on part numbers, course numbers, Dewey Decimal numbers, or house numbers, either, so make 'em all character fields.

Same goes for years stored as date datatypes. If you're recording the year that a movie was released, then there's no advantage to having it as a date. Store it as an integer to make it simple to take differences ("How long after Citizen Kane did ET come out?") or comparisons.

Most of all, keep things consistent. If you've got a 10-character column in one table, and an integer in another, then SQL joins will be very expensive, even if both columns are indexed.

Working on my Big Dirty PHP Project at work, I've found this bit of code in many places.

$categories = "";
$categories = Array();

Why is $categories set to an empty string, and then an array? It's not necessary to pre-initialize a variable before setting it to another value. So why is the code there? It's not just one case. It's throughout the codebase, where I delete the first line whenever I find it.

The original programmer is (thankfully) no longer around to ask, but I'm guessing it's superstition. Perhaps he had some problem that went away for an unrelated reason when he added the first line of the code. The problem is that he never considered why.

Here's another coding horror to avoid in Perl. Ever seen a regular expression by someone who wasn't entirely familiar with regexes and quoted everything whether it needed it or not?

if ( $name =~ /Marcus Holland\-Moritz/ )

The hyphen in Marcus's name isn't a metacharacter, but the unsure, superstitious programmer will quote it anyway. "Eh, it doesn't hurt anything," he may reply, but it also demonstrates his non-mastery of regexes.

If you ever find a piece of your code where you can't understand exactly why it works, why every single statement exists, stop and rework it until you do.

Why I love say

| | Comments (2)

Dean Wilson reveals My Favourite Three Features in a recent blog post, but doesn't discuss my #1 feature of Perl 5.10: The say command.

Why love say so much? It's just the same as print with a "\n" at the end, right? Yup, but that "\n" causes heartache for me in life. See, I've been working on removing interpolation from my life wherever possible. For instance, we've probably all seen beginners do something like:

some_func( "$str" );

where the quotes around $str are unnecessary. (Yes, I know there could be overloaded stringification, but I'm ignoring that possibility here.) That function call should be done as:

some_func( $str );

By the same token, I don't use double quotes any more than necessary. Rather than creating a string as:

my $x = "Bloofga";

do it as

my $x = 'Bloofga';

It's not about speedups. It's about not making the code do anything more than it has to, so that the next programmer does not have to ask "why is this work getting done?" If the code doesn't need the double-quoting, then don't use the double-quoting.

I started down this road when I read this rule in Perl Best Practices, but ignored it. "Eh, no biggie," I thought. Then I started using Perl::Critic, and it complained about everywhere I was using double quotes. As I examined those complaints, I came around to realize that if you're having the computer do work, the next programmer has to wonder why.

So now we get the say command, and I get to eliminate at least 50% of my necessary string interpolation. Instead of:

print "Your results are:\n";

I can now use:

say 'Your results are:';

So much cleaner. In a color-coding editor like vim, the distinction is even clearer, and as MJD likes to point out, "It's easier to see than to think."

Start using say. Even if you're not on 5.10 yet, you can use Perl6::Say for most of the places that say works in 5.10. Even better, stop using unnecessary interpolation altogether.

About this Archive

This page is a archive of recent entries written by Andy Lester in January 2008.

Andy Lester: December 2007 is the previous archive.

Andy Lester: February 2008 is the next archive.

Find recent content on the main index or look in the archives to find all content.

Other Perl Sites

Other Swell Blogs

  • geek2geek: An ongoing analysis of how geeks communicate, how we fail and how to fix it.