PostgreSQL
Analysis of open source database management system PostgreSQL and other products in the PostgreSQL ecosystem. Related subjects include:
Big scientific databases need to be stored somehow
A year ago, Mike Stonebraker observed that conventional DBMS don’t necessarily do a great job on scientific data, and further pointed out that different kinds of science might call for different data access methods. Even so, some of the largest databases around are scientific ones, and they have to be managed somehow. For example:
- Microsoft just put out an overwrought press release. The substance seems to be that Pan-STARRS — a Jim Gray legacy also discussed in an August, 2008 Computerworld article — is adding 1.4 terabytes of image data per night, and one not so new database adds 15 terabytes per year of some kind of computer simulation output used to analyze protein folding. Both run on SQL Server, of course.
- Kognitio has an astronomical database too, at Cambridge University, adding 1/2 a terabyte of data per night.
- Oracle is used for a McGill University proteonomics database called CellMapBase. A figure of 50 terabytes of “mass storage” is included, which doesn’t include tape backup and so on.
- The Large Hadron Collider, once it actually starts functioning, is projected to generate 15 petabytes of data annually, which will be initially stored on tape and then distributed to various computing centers around the world.
- Netezza is proud of its ability to serve images and the like quickly, although off the top of my head I’m not thinking of a major customer it has in that area. (But then, if you just sell software, your academic discount can approach 100%; but if like Netezza you have an actual cost of goods sold, that’s not as appealing an option.)
Long-term, I imagine that the most suitable DBMS for these purposes will be MPP systems with strong datatype extensibility — e.g., DB2, PostgreSQL-based Greenplum, PostgreSQL-based Aster nCluster, or maybe Oracle.
Categories: Aster Data, Data types, Greenplum, IBM and DB2, Kognitio, Microsoft and SQL*Server, Netezza, Oracle, Parallelization, PostgreSQL, Scientific research | 1 Comment |
Has there been any progress on SAP over Postgres?
Peter Eisentraut discouragingly reported in January:
What I hear from my acquaintances at SAP, however, is this:
- SAP doesn’t need fancy database features, since the software doesn’t use them.
- Those who don’t want to buy Oracle can use MaxDB; it’s free.
PostgreSQL doesn’t support in-place upgrades, which makes it unsuitable for multiple terabyte installations typically used by SAP customers.
Has anything changed since then?
And as a trivia challenge, does anybody recognize my science fiction reference in the comment thread there? 🙂 Hint: The dialogue referenced did not occur on the planet Arrakis.
Categories: PostgreSQL | 2 Comments |
Top DBMS on Linux
I was looking up George Crump’s blogs in connection with his recent post on SSDs, and I stumbled upon one that outlines at great length what features Linux backup systems should have. I won’t claim to have read it word for word, but what did catch my eye were a couple of comments on DBMS market share, which boiled down to:
- Oracle
- MySQL
- PostgreSQL
Categories: IBM and DB2, Market share and customer counts, MySQL, Oracle, PostgreSQL | Leave a Comment |
Mike Stonebraker’s counterarguments to MapReduce’s popularity
In response to recent posting I’ve done about MapReduce, Mike Stonebraker just got on the phone to give me his views. His core claim, more or less, is that anything you can do in MapReduce you could already do in a parallel database that complies with SQL-92 and/or has PostgreSQL underpinnnings. In particular, Mike says: Read more
Categories: Data warehousing, MapReduce, Michael Stonebraker, PostgreSQL | 5 Comments |
Greenplum is in the big leagues
After a March, 2007 call, I didn’t talk again with Greenplum until earlier this month. That changed fast. I flew out to see Greenplum last week and spent over a day with president/co-founder Scott Yara, CTO/co-founder Luke Lonergan, marketing VP Paul Salazar, and product management/marketing director Ben Werther. Highlights – besides some really great sushi at Sakae in Burlingame – start with an eye-opening set of customer proof points, such as: Read more
Categories: Analytic technologies, Data warehouse appliances, Data warehousing, Greenplum, Petabyte-scale data management, PostgreSQL | 19 Comments |
EnterpriseDB update
I had lunch today with CTO Bob Zurek of EnterpriseDB, who turns out to live in almost the same town I do (they technically separated in 1783, but share a high school today). DBMS-related highlights included:
- EnterpriseDB thinks PostgreSQL training and certification are a big deal for increasing PostgreSQL adoption.
- EnterpriseDB’s business focus right now (at least, one of them) is moving developers from interest to download to deployment and payment — i.e., the standard funnel for open source and open-source-inspired products.
- EnterpriseDB finds it important to be a good PostgreSQL community citizen. This makes a lot of sense, as EnterpriseDB doesn’t control the core PostgreSQL engine, even if it does employ some of the core PostgreSQL developers.
- But “open source” is not the same as “free”.
- I got the impression that the GridSQL technology EnterpriseDB acquired is being used to go after general read-mostly, horizontally-scaling applications (i.e., MySQL’s sweet spot). I did not get the impression, by way of contrast, that EnterpriseDB is out to play catch-up — e.g., with GreenPlum — in MPP data warehousing.
- Bob pointed out that something like “Vacuum” to clean up the database periodically is needed in a MVCC (MultiVersion Concurrency Control) engine. He thinks PostgreSQL’s autovacuum is good but not ideal.
- Bob draws this as yet another two-dimensional positioning graph, but in essence he thinks PostgreSQL and Postgres Plus are well-suited for a large space that’s above MySQL and below Oracle. I don’t think he really contradicted Kee Kwan’s opinion that there are good times to use PostgreSQL and good times to use MySQL.
- I was wrong when I previously said EnterpriseDB now offers MySQL portability. It just offers MySQL migration.
- The Elastra/EnterpriseDB cloud offering isn’t generally available yet.
- Stay tuned for developments in replication/high availability.
Categories: EnterpriseDB and Postgres Plus, Mid-range, Open source, PostgreSQL | 1 Comment |
Microsoft is buying DATAllegro
I’ve long argued that:
- Oracle and Microsoft are doomed in the data warehouse market unless they acquire MPP/shared-nothing data warehouse DBMS and/or data warehouse appliances.
- DATAllegro is the ideal acquisition for either of them.
Microsoft has now validated my claim by agreeing to buy DATAllegro. As you probably know, we’ve been covering DATAllegro extensively, as per the links listed below.
Basic deal highlights include: Read more
Pushback on the PostgreSQL vs. MySQL comparison
It should come as no surprise that not everybody agrees with EnterpriseDB’s views on the PostgreSQL/MySQL comparison. In particular, the High Availability MySQL blog offers a detailed rebuttal post, with more in the comment thread. According to MySQL fans, EnterpriseDB got its facts wrong on several matters regarding MySQL and InnoDB, especially in the areas of triggers and locking. And of course they disagree with EnterpriseDB’s general conclusion. 🙂
Categories: MySQL, Open source, PostgreSQL | Leave a Comment |
PostgreSQL vs. MySQL, as per EnterpriseDB
EnterpriseDB put out a white paper arguing for the superiority of PostgreSQL over MySQL, even without EnterpriseDB’s own Postgres Plus extensions. Highlights of EnterpriseDB’s opinion include:
- EnterpriseDB asserts that MyISAM is the only MySQL storage engine with decent performance.
- EnterpriseDB then bashes MyISAM for all sorts of well-deserved reasons, especially ACID-noncompliance.
- EnterpriseDB asserts that row-level triggers, lacking in MySQL but present in PostgreSQL, are the most important kind of trigger.
- EnterpriseDB claims PostgreSQL is superior in procedural language support to MySQL.
- EnterpriseDB claims PostgreSQL is superior in authentication support to MySQL.
Categories: EnterpriseDB and Postgres Plus, Mid-range, MySQL, Open source, PostgreSQL | 17 Comments |
Yahoo scales its web analytics database to petabyte range
Information Week has an article with details on what sounds like Yahoo’s core web analytics database. Highlights include:
- The Yahoo web analytics database is over 1 petabyte. They claim it will be in the 10s of petabytes by 2009.
- The Yahoo web analytics database is based on PostgreSQL. So much for MySQL fanboys’ claims of Yahoo validation for their beloved toy … uh, let me rephrase that. The highly-regarded MySQL, although doing a great job for some demanding and impressive applications at Yahoo, evidently wasn’t selected for this one in particular. OK. That’s much better now.
- But the Yahoo web analytics database doesn’t actually use PostgreSQL’s storage engine. Rather, Yahoo wrote something custom and columnar.
- Yahoo is processing 24 billion “events” per day. The article doesn’t clarify whether these are sent straight to the analytics store, or whether there’s an intermediate storage engine. Most likely the system fills blocks in RAM and then just appends them to the single persistent store. If commodity boxes occasionally crash and lose a few megs of data — well, in this application, that’s not a big deal at all.
- Yahoo thinks commercial column stores aren’t ready yet for more than 100 terabytes of data.
- Yahoo says it got great performance advantages from a custom system by optimizing for its specific application. I don’t know exactly what that would be, but I do know that database architectures for high-volume web analytics are still in pretty bad shape. In particular, there’s no good way yet to analyze the specific, variable-length paths users take through websites.