Recently in Code craft Category

I'm in the middle of a game of Scrabulous with Christoper Humphries on Facebook, and I get "tolkien" handed to me in my tray. Good letters, and I ought to be able to make a bingo out of them. Alas, the best I could get to play on the board was "knot", but what else could I have made? Perl to the rescue!

All I need to do is match across the contents of /usr/share/dict/words in a Perl one-liner. The -n flag means "loop over the input file, but don't print $_". My little program goes in -e, and it looks like this:

$ perl -lne'print if /t/ && /o/ && /l/ && /k/ && /i/ &&
    /e/ && /n/' /usr/share/dict/words 
allokinetic
ankylopoietic
anticlockwise
automatonlike
bibliokleptomania
....

Lots of good words, but they're awfully long. Let's limit it to seven-letter bingos. We have to use the -l flag to drop the linefeed from the input lines, so the length call is accurate.

$ perl -lne'print if /t/ && /o/ && /l/ && /k/ && /i/ &&
    /e/ && /n/ && length($_)==7' /usr/share/dict/words
$

Shoot, nothing there. Let's try eight.

perl -lne'print if /t/ && /o/ && /l/ && /k/ && /i/ &&
    /e/ && /n/ && length($_)==8' /usr/share/dict/words 
knotlike
townlike

"knotlike"! That would have been beautiful. Oh well. :-(

Thread over on perlmonks talks about Tom Christiansen's assertion that you should use it, by default, even when you only have one command-line argument to parse:

What seems to happen is that at first we just want to add--oh say for example JUST ONE, SINGLE LITTLE -v flag. Well, that's so easy enough to hand-hack, that of course we do so... But just like any other piece of software, these things all seem to have a way of overgrowing their original expectations... Getopt::Long is just *wonderful*, up--I believe--to any job you can come up with for it. Too often its absence means that I've in the long run made more work for myself--or others--by not having used it originally. [Emphasis mine -- Andy]

I can't agree more. I don't care if you use Getopt::Long or Getopt::Declare or Getopt::Lucid or any of the other variants out there. You know know know that you're going to add more arguments down the road, so why not start out right?

Yes, it can be tricky to get through all of its magic if you're unfamiliar with it, but it's pretty obvious when you see enough examples. Take a look at prove or ack for examples. mech-dump is pretty decent as an example as well:

GetOptions(
    'user=s'        => \$user,
    'password=s'    => \$pass,
    forms           => sub { push( @actions, \&dump_forms ); },
    links           => sub { push( @actions, \&dump_links ); },
    images          => sub { push( @actions, \&dump_images ); },
    all             => sub { push( @actions, \&dump_forms, \&dump_links, \&dump_images ); },
    absolute        => \$absolute,
    'agent=s'       => \$agent,
    'agent-alias=s' => \$agent_alias,
    help            => sub { pod2usage(1); },
) or pod2usage(2);

Where the value in the hashref is a variable reference, the value gets stored in there. Where it's a sub, that sub gets executed with the arguments passed in. That's the basics, and you don't have to worry about anything else. Your user can pass --abs instead of --absolute if it's unambiguous. You can have mandatory flags, as in agent=s, where --agent must take a string. On and on, it's probably got the functionality you need.

One crucial reminder: You must check the return code of GetOptions. Otherwise, your program will carry on. If someone gives your program an invalid argument on the command-line, then you know that the program cannot possibly be running in the way the user intended. Your program must stop immediately.

Not checking the return of GetOptions is as bad as not checking the return of open. In fact, I think I smell a new Perl Critic policy....

From The Pragmatic Programmer:

What's the value of pi? If you're wondering how much edging to put around a circular flower bed, then "3" is probably good enough. If you're in school, then maybe "22/7" is a good approximation. If you're in NASA, then maybe 12 decimal places will do.

Here's a little article about the "file header tax", lines of boilerplate at the top of files that serve no purpose. Copyright notices, disclaimers, maybe even some revision history, it's all just clutter, and clutter is technical debt.

Take a look at the next file you edit. Is there anything at the top of it that is not functional code? Ask yourself if it really needs to be there. If in doubt, throw it out.

Jared Parsons writes about how Part of being a good programmer is learning not to trust yourself. It's filled with basic but all-too-often-forgotten wisdom about defensive programming. Key bits: "Turn assumptions into compiler errors," "The best way to avoid making bad assumptions is to actively question them at all times," and "1 test is worth 1000 expert opinions."

I also chuckled to see a sidebar disclaimer that said "All code posted to this site is covered under the Microsoft Permissive Lice." I'd heard of parasitic licensing before, but never like this!

Dropping vowels to shorten names is a terrible practice. Quick, someone give me an idea what $hdnchgdsp means, an Actual Variable from some Bad Code I'm working on today.

It's not just variables names, either. Filenames often need to be shortened, but dropping vowels is not the way to do it. You're left with unpronounceable names that are annoying to type.

The key to effective abbreviation is not removal of letters from the middle of the words, but from the end. Sometimes, it doesn't make sense to shorten a word at all, like "post". If you have a file that is supposed to "post audit transactions", call it "post-aud-trans" or "post-aud-trx", not "pst_adt_trns".

How not to do a Changes file

| | Comments (5)

Here's how to not do a Changes file:

http://search.cpan.org/src/FELICITY/Mail-SpamAssassin-3.1.5/Changes

That tells me nothing about whether I want to upgrade my SpamAssassin install. :-(

Oh, look, I wrote about this before, and how great Tim Bunce's Changes files are.

From David Fetter's page at http://fetter.org/optimization.html:

  1. The first rule of Optimization Club is, you do not Optimize.
  2. The second rule of Optimization Club is, you do not Optimize without measuring.
  3. If your app is running faster than the underlying transport protocol, the optimization is over.
  4. One factor at a time.
  5. No marketroids, no marketroid schedules.
  6. Testing will go on as long as it has to.
  7. If this is your first night at Optimization Club, you have to write a test case.

Of course it's company policy never to imply ownership of a performance problem. Always use the indefinite article: "a performance problem", never "your performance problem."

Inspired by a post in the Beautiful Code blog, Simon Wistow created Acme::Numbers, which lets you do cleverness like this:

use Acme::Numbers;

print two.hundred."\n";               # prints 200
print forty.two."\n";                 # prints 42
print zero.point.zero.five."\n";      # prints 0.05
print four.pounds.fifty.five."\n";    # prints "4.55"
print four.pounds.fifty.pence."\n";   # prints "4.50"
print four.dollars.fifty.cents."\n";  # prints "4.55"

You probably wouldn't want to do this in production code, but like the best of Damian Conway's not-useful-but-thought-provoking modules, it may spark some ideas that you can apply to more useful situations. If nothing else, the source is a fine lesson in overloading and method importing.

The man who couldn't refactor

| | Comments (1)

For the past few months, I've been slogging through some PHP code written by a solo programmer with no real oversight from other programmers. His code is a monoculture.

I found this bit of code today that just sums up his unwillingness, or perhaps inability, to refactor.

if (substr($libtest,0,12) == "FOO COUNTY")
    $foocounty=$getmultiples=1;
if ((substr($libtest,0,12) == "FOO COUNTY") && ($state=="TX")) {
    $foocounty=$getmultiples=0;
    $foocountytx=$getmultiples=1;
}

Just read those lines of code and you can recreate the crime in your head. First there was a customer in Foo County. Then, we had to handle a different Foo County, but this Foo County was in Texas. He couldn't even be bothered to change the initial test to be more specific, or to modify the existing code. His solution was the simplest thing that could possibly work, and was also the worst: Reversing the effect of the first check for Foo County. There's also no checking for non-Texas, non-original Foo County, but when I checked I found that we have customers that are in Foo County in THREE different states.

The programmer no longer works for us, so I'm unable to ask him about his motivations. I'm fascinated by the mindset that is unable to do the barest rework necessary.

About this Archive

This page is a archive of recent entries in the Code craft category.

CPAN is the next category.

Find recent content on the main index or look in the archives to find all content.

Other Perl Sites

Other Swell Blogs

  • geek2geek: An ongoing analysis of how geeks communicate, how we fail and how to fix it.