Optimizing file searches with File::Find::Rule

Adam Kennedy posted an excellent article about huge performance hits he found with File::Find::Rule. From the docs, there's this sample to find all the *.pm files in @INC:

# Find all the .pm files in @INC
my @files = File::Find::Rule->file
                            ->name( '*.pm' )
                            ->in( @INC );

What this search REALLY says is "Find every single file in all these trees, then do an slow IO stat call to the operating system on every single one to work out which ones are files, and only then do a quick regex match on the file names to keep the 5% that have the ending we want and throw away the 95% that don't".

Now I'm worried about if I'm doing the right order of checking in File::Next, a lightweight file finder that ack relies on.



Well, yes and no. The way File::Next works, you basically can't call the filter callback unless you know that it's a file or directory for file() or directory(), because that's now what the user expects. So, it's "the slow way" but the interface does not allow it to be optimized away. You could probably optimize ack itself by using everything and putting the stat-checking things after the name but before the content check.

Leave a comment

About this Entry

This page contains a single entry by Andy Lester published on May 14, 2008 9:07 AM.

The case of the blocking CREATE INDEX call was the previous entry in this blog.

How to write a simple database-backed website with Perl modules Mason and Class::DBI is the next entry in this blog.

Find recent content on the main index or look in the archives to find all content.

Other Perl Sites

Other Swell Blogs

  • geek2geek: An ongoing analysis of how geeks communicate, how we fail and how to fix it.