Recently in Databases Category

Alfie John over at rental-property.co.nz wrote to tell that the source code for the entire site, written using Mason and Class::DBI, is available for download.

For someone wanting to see an overview of how either Mason or Class::DBI work with real-world examples, not just samples from documentation, this is a great place to start.

I'd been working on a new functional index for the work website. I created a Pgsql function to normalize the title of a book

CREATE OR REPLACE FUNCTION exacttitle_key( TEXT ) 
    RETURNS text AS $$
    DECLARE
        key TEXT := upper( $1 );

    BEGIN
        key = regexp_replace( key,
            '^ *(?:A|AN|EL|LA|LO|THE|LOS|LAS)\\M *', '' );
        key = regexp_replace( key, '[^0-9A-Z ]+', '', 'g' );
        key = regexp_replace( key, ' {2,}', ' ', 'g' );

        RETURN trim( key );
    END

$$ LANGUAGE 'plpgsql'
IMMUTABLE STRICT;

and tested it out, and all looked well. It was marked as IMMUTABLE, so Pg can use it as an index function. I created the index in psql:

create index testbook_exacttitle on testbook 
    using btree (exacttitle_key(title));

And all was well. Now I wanted to see how long it took to create that index, so from the shell I did:

$ time psql -c'drop index testbook_exacttitle; \
    create index testbook_exacttitle  on testbook \
    using btree (exacttitle_key(title));'

I knew it would take about 5 minutes to add this index on 6.7 million records in testbook, so I didn't expect it to come back right away. Then I realized that site response fell off the table. ptop showed a couple dozen SELECT queries waiting to run. I killed the process that was running the CREATE INDEX. All the pending queries went on their merry way. Everything was back to normal.

I tried that command line again, and the results were identical. Dozens of queries backed up until I killed the CREATE INDEX process. But why were those queries backing up? That index was not used by any code yet. I asked in #postgresql, but nobody knew the answer. Then, someone said a word that clicked in my head. I made a little change to how I was running the commands, and everything worked just fine.

What was the word that helped Encyclopedia Lester figure out the problem? Turn to page 47 for the answer.

The word was "transaction". If there are multiple commands as part of the -c option to psql, they are executed in in one transaction. DROP INDEX blocks on the entire table, so the entire transaction blocked. When I ran the DROP INDEX separately, and then reran the CREATE INDEX by itself, there was only the long blocking on the new index, which did not yet exist.

(With apologies to Donald J. Sobol and Encyclopedia Brown)

I see this missed so many times I have to bring it up here: "If you have a database column that contains only digits, but will not perform calculations on it, make it a character column."

You CAN store a 10-digit phone number as an integer, but why would you want to? You CAN store a Social Security Number as a 9-digit number, but why would you want to? Surely you're not so concerned of a few bytes savings. Storing an SSN of "0123456789" as a number means you use the leading zero, too, so you lose fidelity of data. Any string of digits follows this rule. You don't perform calculations on part numbers, course numbers, Dewey Decimal numbers, or house numbers, either, so make 'em all character fields.

Same goes for years stored as date datatypes. If you're recording the year that a movie was released, then there's no advantage to having it as a date. Store it as an integer to make it simple to take differences ("How long after Citizen Kane did ET come out?") or comparisons.

Most of all, keep things consistent. If you've got a 10-character column in one table, and an integer in another, then SQL joins will be very expensive, even if both columns are indexed.

About this Archive

This page is a archive of recent entries in the Databases category.

Data munging is the previous category.

Documentation is the next category.

Find recent content on the main index or look in the archives to find all content.

Other Perl Sites

Other Swell Blogs

  • geek2geek: An ongoing analysis of how geeks communicate, how we fail and how to fix it.