Optimizing file searches with File::Find::Rule
Adam Kennedy posted an excellent article about huge performance hits he found with File::Find::Rule. From the docs, there's this sample to find all the *.pm files in @INC:
# Find all the .pm files in @INC my @files = File::Find::Rule->file ->name( '*.pm' ) ->in( @INC );What this search REALLY says is "Find every single file in all these trees, then do an slow IO stat call to the operating system on every single one to work out which ones are files, and only then do a quick regex match on the file names to keep the 5% that have the ending we want and throw away the 95% that don't".
Now I'm worried about if I'm doing the right order of checking in File::Next, a lightweight file finder that ack relies on.
Well, yes and no. The way File::Next works, you basically can't call the filter callback unless you know that it's a file or directory for file() or directory(), because that's now what the user expects. So, it's "the slow way" but the interface does not allow it to be optimized away. You could probably optimize ack itself by using everything and putting the stat-checking things after the name but before the content check.