• Big improvements in mini-CPAN tools

    The minicpan tool in CPAN::Mini lets you keep a copy of the most recent revisions of each module on the CPAN. Having a mini-CPAN is a great tool for anyone with a laptop, or who wants to look at the CPAN as a whole, or who wants to create a mini-mirror of CPAN to support a large installation without having to hit the net for each module install. An entire mini-CPAN only takes up about a gigabyte of drive space.

    Ricardo Signes, CPAN::Mini's author, wrote to tell me:

    CPAN::Mini 0.569 includes an obvious optimization: instead of reconnecting to your remote mirror for every file that might need updating, `minicpan` will now keep one HTTP connection open for the entire update. While I can't give numbers that reflect the most common cases of usage, a run that checks every file and finds no updates goes, on my laptop, from about two minutes to about twenty seconds -- about 1/6 the time! It also puts less load on the remote server, making it a friendlier way to keep a local mirror.

    Also, Adam Kennedy has just posted about a major upgrade to CPAN::Mini::Extract, a tool to make it easy to get individual files from tarballs, that speeds up extraction:

    By shifting expansion to a one-shot extraction to a temp file, and then opening tarballs once from the temp file, I managed to get a two to three times speed up for file extraction. Combined with CPAN::Mini pipelining, this makes CPAN::Mini::Extract massively faster (a 200%-300% overall speed up).
  • Post-Its from BarCampPortland

    Selena Deckelmann has come back from BarCampPortland with copies of every Post-It on the topic selection board. The topic selection board at an unconference like a BarCamp is where people write on a Post-It a topic they'd like to see presented, and put it on a board for all to see. Whichever topics people vote for are the topics that are presented.

    Scanning through the photoset on Flickr is fascinating, as these often are. Topics range from Pirates Paying Artists to WordPress as CMS to How to lie with statistics to Should we replace Congress with a wiki?

    Also fascinating to see how widespread Twitter has become, with half the Post-Its leaving @usernames as contact information.

    Makes me want to start up a Bar Camp Chicago. And move to Portland.

  • How fresh is the CPAN?

    According to statistics by LaPerla, the freshest 25% of CPAN is newer than how old?

    1. 3.8 weeks
    2. 38 days
    3. 3.8 months
    4. 38 months
    5. 3.8 years

  • TPF wants your input on Q2 grant proposals

    In a new move for TPF, the grans committee is soliciting community input on the proposals for this quarter. Alberto Simões writes...

    To this post follows a set of posts with proposals received by the Perl Foundation grants committee during the second call for grant proposals for 2008. Although not usual, the rules of the TPF GC are changing and we hope to make this a rule. Proposals are accepted during one month and after that period, they are posted for public discussion on the Internet. This is important to make GC more aware of the community interest on the project, and to help opening the grants attribution process.

    During the month of April we received the following grant proposals:

    Please take some time on reading the proposals carefully and give some feedback on the relevance of the proposals.

    The article doesn't say where or how to give feedback, or by when. I'd start at the original posting on the TPF blog.

  • The case of the blocking CREATE INDEX call

    I'd been working on a new functional index for the work website. I created a Pgsql function to normalize the title of a book

    RETURNS text AS $$
    key TEXT := upper( $1 );
    key = regexp_replace( key,
    '^ *(?:A|AN|EL|LA|LO|THE|LOS|LAS)\M *', '' );
    key = regexp_replace( key, '[^0-9A-Z ]+', '', 'g' );
    key = regexp_replace( key, ' {2,}', ' ', 'g' );
    RETURN trim( key );
    $$ LANGUAGE 'plpgsql'

    and tested it out, and all looked well. It was marked as IMMUTABLE, so Pg can use it as an index function. I created the index in psql:

    create index testbook_exacttitle on testbook
    using btree (exacttitle_key(title));

    And all was well. Now I wanted to see how long it took to create that index, so from the shell I did:

    $ time psql -c'drop index testbook_exacttitle; 
    create index testbook_exacttitle  on testbook 
    using btree (exacttitle_key(title));'

    I knew it would take about 5 minutes to add this index on 6.7 million records in testbook, so I didn't expect it to come back right away. Then I realized that site response fell off the table. ptop showed a couple dozen SELECT queries waiting to run. I killed the process that was running the CREATE INDEX. All the pending queries went on their merry way. Everything was back to normal.

    I tried that command line again, and the results were identical. Dozens of queries backed up until I killed the CREATE INDEX process. But why were those queries backing up? That index was not used by any code yet. I asked in #postgresql, but nobody knew the answer. Then, someone said a word that clicked in my head. I made a little change to how I was running the commands, and everything worked just fine.

    What was the word that helped Encyclopedia Lester figure out the problem? Turn to page 47 for the answer.

    The word was "transaction". If there are multiple commands as part of the -c option to psql, they are executed in in one transaction. DROP INDEX blocks on the entire table, so the entire transaction blocked. When I ran the DROP INDEX separately, and then reran the CREATE INDEX by itself, there was only the long blocking on the new index, which did not yet exist.

    (With apologies to Donald J. Sobol and Encyclopedia Brown)