Sloppy and lazy database handling: PHP's shortcomings, part #47

Here's yet another reason why PHP is deficient when it comes to doing Real Work with Real Databases. We're currently working on a PostgreSQL-based database with six million book records. We want to walk the book table and slurp up the rows as we go. The code looks like this.

$sql = 'select field1, field2, ... field8 from book';
$set = pg_query( $sql );
while ( $row = pg_fetch_row( $set ) ) {
    // Do some processing
}
pg_free_result( $set );

Nothing crazy, right? I'm just walking the table, one row at a time. However, the PHP PostgreSQL functions slurp up all six million rows, even though my calling code doesn't need to see all the rows at once. This eats up tons of RAM and makes us worry about memory allocation for the PHP process. too bad for us users! We can't even get to the first pg_fetch_row() call because the pg_query slurps up all the rows immediately.

Now, there IS functionality where you can retrieve a given row, like:

$row = pg_fetch_row( $set, $rowno );

In this case, it's necessary to iterate to that given $rowno, but even then you wouldn't have to get to the end of the dataset.

If that's not lazy enough coding on the back end, check out this little nugget from the docs for the pg_free_result function:

pg_free_result() frees the memory and data associated with the specified PostgreSQL query result resource.

This function need only be called if memory consumption during script execution is a problem. Otherwise, all result memory will be automatically freed when the script ends.

The PHP docs encourage sloppy coding. Rather than promoting good, safe programming behaviors, they're saying "Eh, no big deal." It's like the PHP team knows that PHP won't be used on large datasets, and it becomes a self-fulfilling prophecy.

PHP needs to be left to the ghetto of guestbooks and the odd bulletin board system. Leave the real work to the real languages.

0 TrackBacks

Listed below are links to blogs that reference this entry: Sloppy and lazy database handling: PHP's shortcomings, part #47.

TrackBack URL for this entry: http://perlbuzz.com/cgi-bin/mt/mt-tb.cgi/362

10 Comments

ydna.livejournal.com Author Profile Page said:
Why the bashing of other languages like PHP here in perlbuzz? I'm not disagreeing with the content and I can agree with the sentiment. But it just seems like the wrong message for this particular outlet. MOR PURL, KTHX
Andy Lester Author Profile Page said:
Short version: The shortsightedness of PHP is a crucial distinction between PHP and other languages. (See also: Lack of namespaces) I want to help make sure that people who are on the fence realize the severity of their decision.
perlpunk Author Profile Page said:
quote from DBD::Pg version 2.2.2:
"Although PostgreSQL supports cursors, they have not been used in the current implementation. When DBD::Pg was defined, cursors in PostgreSQL could only be used inside a transaction block. Because only one transaction block at a time is allowed, this would have implied the restriction not to use any nested SELECT statements. Therefore the execute method *fetches all data at once* into data structures located in the front-end application. This fact must to be considered when selecting large amounts of data!"
so you have actually the same problem in perl when using DBD::Pg
Andy Lester Author Profile Page said:
I don't know what the difference is then, Perlpunk, but we have no similar problems when we walk the DB with DBD::Pg.
Shawn Boyette said:
I agree with the first comment. More positivity about Perl (and programming in general), less bashing of anything. I think the (usual) quality of perlbuzz is rapidly making it a go-to place for showing people neat things about Perl (I know I point people here not infrequently), and this sort of heated, personal editorial feels really out of place. Carry on.
szbalint said:
I think the difference is simply that the PHP pg library slurps all info in after the execute phase for some reason, while DBI does not do that. DBI is excellent, it tries very hard to be fast and use as little memory as possible. The PHP mysql and pg interfaces are wrong in that respect that they don't emphasize the separation of the prepare and execute stages and their performance impact, but now it appears they don't separate execute and fetch properly either. Personally, I'm not sure what I'd do without the prepare, execute and fetch distinctions.
Less bashing of other languages please. Let's be nice. And yeah, the documentation for DBD::Pg does lead one to believe that it too is rather slurp happy because of lack of cursor support. Maybe a doc patch is in order if this isn't really happening in the real world.
Torgny Author Profile Page said:
I would recommend trying PDO (PDO_PGSQL) instead of the raw PG function calls.
Andy Lester Author Profile Page said:
We tried using PDO as well, and the results were the same. If you check out the source in ext/pdo_pgsql, you'll see it has the same problem: It asks for the number of rows in the result set even if the calling app has no need for it.
"However, the PHP PostgreSQL functions slurp up all six million rows, even though my calling code doesn't need to see all the rows at once." I think that libpq may actually be responsible for this behavor, I know this has been an issue with DBD::Pg. A workable solution is to use a cursor.

Leave a comment

(If you haven't left a comment here before, you may need to be approved by the site owner before your comment will appear. Until then, it won't appear on the entry. Thanks for waiting.)

About this Entry

This page contains a single entry by Andy Lester published on March 14, 2008 11:23 AM.

USENIX opens conference proceedings was the previous entry in this blog.

rt.cpan.org source released is the next entry in this blog.

Find recent content on the main index or look in the archives to find all content.

Other Perl Sites

Other Swell Blogs

  • geek2geek: An ongoing analysis of how geeks communicate, how we fail and how to fix it.
Technorati Profile