Sloppy and lazy database handling: PHP's shortcomings, part #47


Here's yet another reason why PHP is deficient when it comes to doing Real Work with Real Databases. We're currently working on a PostgreSQL-based database with six million book records. We want to walk the book table and slurp up the rows as we go. The code looks like this.

$sql = 'select field1, field2, ... field8 from book';
$set = pg_query( $sql );
while ( $row = pg_fetch_row( $set ) ) {
    // Do some processing
pg_free_result( $set );

Nothing crazy, right? I'm just walking the table, one row at a time. However, the PHP PostgreSQL functions slurp up all six million rows, even though my calling code doesn't need to see all the rows at once. This eats up tons of RAM and makes us worry about memory allocation for the PHP process. too bad for us users! We can't even get to the first pg_fetch_row() call because the pg_query slurps up all the rows immediately.

Now, there IS functionality where you can retrieve a given row, like:

$row = pg_fetch_row( $set, $rowno );

In this case, it's necessary to iterate to that given $rowno, but even then you wouldn't have to get to the end of the dataset.

If that's not lazy enough coding on the back end, check out this little nugget from the docs for the pg_free_result function:

pg_free_result() frees the memory and data associated with the specified PostgreSQL query result resource.

This function need only be called if memory consumption during script execution is a problem. Otherwise, all result memory will be automatically freed when the script ends.

The PHP docs encourage sloppy coding. Rather than promoting good, safe programming behaviors, they're saying "Eh, no big deal." It's like the PHP team knows that PHP won't be used on large datasets, and it becomes a self-fulfilling prophecy.

PHP needs to be left to the ghetto of guestbooks and the odd bulletin board system. Leave the real work to the real languages.


Why the bashing of other languages like PHP here in perlbuzz? I'm not disagreeing with the content and I can agree with the sentiment. But it just seems like the wrong message for this particular outlet.


Short version: The shortsightedness of PHP is a crucial distinction between PHP and other languages. (See also: Lack of namespaces) I want to help make sure that people who are on the fence realize the severity of their decision.

quote from DBD::Pg version 2.2.2:

"Although PostgreSQL supports cursors, they have not been used in the current implementation. When DBD::Pg was defined, cursors in PostgreSQL could only be used inside a transaction block. Because only one transaction block at a time is allowed, this would have implied the restriction not to use any nested SELECT statements. Therefore the execute method *fetches all data at once* into data structures located in the front-end application. This fact must to be considered when selecting large amounts of data!"

so you have actually the same problem in perl when
using DBD::Pg

I don't know what the difference is then, Perlpunk, but we have no similar problems when we walk the DB with DBD::Pg.

I agree with the first comment. More positivity about Perl (and programming in general), less bashing of anything.

I think the (usual) quality of perlbuzz is rapidly making it a go-to place for showing people neat things about Perl (I know I point people here not infrequently), and this sort of heated, personal editorial feels really out of place.

Carry on.

I think the difference is simply that the PHP pg library slurps all info in after the execute phase for some reason, while DBI does not do that.

DBI is excellent, it tries very hard to be fast and use as little memory as possible.

The PHP mysql and pg interfaces are wrong in that respect that they don't emphasize the separation of the prepare and execute stages and their performance impact, but now it appears they don't separate execute and fetch properly either.

Personally, I'm not sure what I'd do without the prepare, execute and fetch distinctions.

Less bashing of other languages please. Let's be nice.

And yeah, the documentation for DBD::Pg does lead one to believe that it too is rather slurp happy because of lack of cursor support. Maybe a doc patch is in order if this isn't really happening in the real world.

I would recommend trying PDO (PDO_PGSQL) instead of the raw PG function calls.

We tried using PDO as well, and the results were the same. If you check out the source in ext/pdo_pgsql, you'll see it has the same problem: It asks for the number of rows in the result set even if the calling app has no need for it.

"However, the PHP PostgreSQL functions slurp up all six million rows, even though my calling code doesn't need to see all the rows at once."

I think that libpq may actually be responsible for this behavor, I know this has been an issue with DBD::Pg. A workable solution is to use a cursor.

Leave a comment

Job hunting for programmers

Land the Tech Job You Love, Andy Lester's guide to job hunting for programmers and other technical professionals, is available in PDF, ePub and .mobi formats, all DRM-free, as well as good old-fashioned paper.