I’ve started a group, rethinking-cpan, for discussing the ideas I’ve posted here. — Andy

Every few months, someone comes up with a modest proposal to improve CPAN and its public face.
Usually it’ll be about “how to make CPAN easier to search”.
It may be
about adding reviews to search.cpan.org, or reorganizing the categories, or
any number of relatively easy-to-implement tasks. It’ll be a good idea,
but it’s focused too tightly.

We don’t want to “make CPAN easier to search.” What we’re really trying to do is
help with the selection process.
We want to help the user find and select the best tool for the job.

It might involve showing the user the bug queue; or a list of
reviews; or an average star rating. But ultimately, the goal is
to let any person with a given problem find and select a solution.

“I want to parse XML, what should I use?” is a common question. XML::Parser? XML::Simple?
XML::Twig? If “parse XML” really means “find a single tag out of
a big order file my boss gave me”, the answer might well be a regex,
no?
Perl’s mighty CPAN is both blessing and curse. We have
14,966 distributions as I write this, but people say “I can’t find
what I want.” Searching
for “XML”
is barely a useful exercise.

Success in the real world

Let’s take a look at an example outside of the programming world.
In my day job, I work for
Follett Library Resources and
Book Wholesalers, Inc.
We are basically the Amazon.com for the school
& public library markets, respectively. The key feature to the
website is not ordering, but in helping librarians decide what books
they should buy for their libraries. Imagine you have an elementary
school library, and $10,000 in book budget for the year. What books
do you buy? Our website is geared to making that happen.

Part of this is technical solutions. We have effective keyword
searching, so you can search for “horses” and get books about horses.
Part of it is filtering, like “I want books for this grade level,
and that have been positively reviewed in at least two journals,”
in addition to plain ol’ keyword searching. Part of it is showing
book covers, and reprinting reviews from journals. (If anyone’s
interested in specifics, let me know and I can probably get you
some screenshots and/or guest access.)

BWI takes it even farther. There’s an entire department called
Collection Development where librarians select books, CDs & DVDs
to recommend to the librarians. The recommendations could be based
on choices made by the CollDev staff directly. They could be
compiled from awards lists (Caldecott, Newbery) or state lists (the
Texas Bluebonnet Awards, for example). Whatever the source, they
help solve the customer’s problem of “I need to buy some books,
what’s good?”

This is no small part of the business. The websites for the two
companies are key differentiators in the marketplace. Specifically,
they raise the company’s level of service from simply providing an
item to purchase to actually helping the customer do her/his job. There’s no point in providing
access to hundreds of thousands of books, CDs and DVDs if the librarian can’t decide what to buy.
FLR is the #1 vendor in the market, in large part because of the effectiveness of the website.

Relentless focus on finding the right thing

Take a look at the front of the FLR website. As I write this, the
page first thing a user sees is “Looking for lists of top titles?”
That link leads to
a page of lists for users to browse. Award lists,
popular series grouped by grade level, top video choices, a list called “Too good to miss,” and so on.
The entire focus that the user sees is “How can I help you find what you want?”

Compare that with the front page
of search.cpan.org
. Twenty-six links to the categories that
link to modules in the archaic Module List. Go on, tell me what’s
in “Control Flow Utilities,” I dare you. Where do I find my XML
modules? Seriously, read through all 26 categories
without laughing and/or crying. Where would someone find Template
Toolkit? Catalyst? ack? Class::Accessor? That one module that
I heard about somewhere that lets me access my Lloyd’s bank account
programtically?

Even if you can navigate the categories, it hardly matters. Clicking
through to the category list leads to a one-line description like
“Another way of exporting symbols.” Plus, the majority of modules
on CPAN are not registered in the Module List. The Module List is
an artifact a decade old that has far outlived its original usefulness.

What can we do?

There have been attempts, some implemented, some not, to do many
of these things that FLR & BWI do very effectively. We have
CPAN ratings and keyword searching, for example. BWI selects lists
of top books, and
Shlomi
Fish has recently suggested
having reviews of categories of
modules, which sounds like a great idea. I made a very tentative start on
this on perl101.org
. But it’s not enough.

We need to stop thinking tactical (“Let’s have reviews”)
and start thinking (“How do we get the proper modules/solutions in
the hands of the users that want them.”) Nothing short of a complete
overhaul of the front end of the CPAN will make a dent in this
problem. We need a revolution, not evolution, to solve the problem.