DBMS product categories

Analysis of database management technology in specific product categories. Related subjects include:

September 29, 2009

What Nielsen really uses in data warehousing DBMS

In its latest earnings call, Oracle made a reference to The Nielsen Company that was — to put it politely — rather confusing. I just plopped down in a chair next to Greg Goff, who evidently runs data warehousing at Nielsen, and had a quick chat. Here’s the real story.

The Nielsen Company has over half a petabyte of data on Netezza in the US. This installation is growing.
The Nielsen Company indeed has 45 terabytes or whatever of data on Oracle in its European (Customer) Information Factory. This is not particularly growing. Nielsen’s Oracle data warehouse has been built up over the past 9 years. It’s not new. It’s certainly not on Exadata, nor planned to move to Exadata.
These are not single-instance databases. Nielsen’s biggest single Netezza database is 20 terabytes or so of user data, and its biggest single Oracle database is 10 terabytes or so.
Much (most?) of the rest of the installations are customer data marts and the like, based in each case on the “big” central database. (That’s actually a classic data mart use case.) Greg said that Netezza’s capabilities to spin out those databases seemed pretty good.
That 10 terabyte Oracle data warehouse instance requires a lot of partitioning effort and so on in the usual way.
Nielsen has no immediate plans to replace Oracle with Netezza.
Nielsen actually has 800 terabytes or so of Netezza equipment. Some of that is kept more lightly loaded, for performance.

Categories: Analytic technologies, Data mart outsourcing, Data warehouse appliances, Data warehousing, Netezza, Oracle, Specific users

6 Comments

September 29, 2009

Thoughts on the integration of OLTP and data warehousing, especially in Exadata 2

Oracle is pushing Exadata 2 as being a great system for any of OLTP (OnLine Transaction Processing), data warehousing or, presumably, the integration of same. This claim rests on a few premises, namely: Read more

Categories: Analytic technologies, Data warehouse appliances, Data warehousing, Exadata, OLTP, Oracle, Solid-state memory, Theory and architecture

36 Comments

September 25, 2009

The hunt for Oracle Exadata production references

Over the past four weeks, I’ve given speeches in Boston, DC, Milan, London, and SF,* attended a conference in Lyon, done a fair amount of consulting, and taken a few non-client briefings as well. That’s why I haven’t had much of a chance to sit down, analyze the tea leaves, and write about Exadata 2. (Small exception: Highlights from and remarks on the Oracle Database 11g Release 2 white paper.) I hope to do that soon.

*I’ll bop over to Chicago for the last of the series early next week.

But first — can anybody identify much in the way of Exadata production references? Oracle recently talked of a few flagship data warehouse customers, but those don’t seem to be running Exadata. I talked recently with an Oracle prospect from the US, who only got one reference from Oracle — in Eastern Europe. (Well, two references, if you also count the system integrator on the same deal.)

So far as I can tell, Oracle Exadata production sites are pretty scarce on the ground. What, if anything, am I missing?

Categories: Analytic technologies, Data warehouse appliances, Data warehousing, Exadata, Market share and customer counts, Oracle

17 Comments

September 21, 2009

Notes on the Oracle Database 11g Release 2 white paper

The Oracle Database 11g Release 2 white paper I cited a couple of weeks ago has evidently been edited, given that a phrase I quoted last month is no longer to be found. Anyhow, here are some quotes from and comments on what evidently is the latest version. Read more

Categories: Analytic technologies, Archiving and information preservation, Cache, Columnar database management, Data warehouse appliances, Data warehousing, Database compression, Exadata, Memory-centric data management, OLTP, Oracle, Oracle TimesTen, Parallelization, Solid-state memory, Storage, Theory and architecture

8 Comments

September 13, 2009

HadoopDB

Despite a thoughtful heads-up from Daniel Abadi at the time of his original posting about HadoopDB, I’m just getting around to writing about it now. HadoopDB is a research project carried out by a couple of Abadi’s students. Further research is definitely planned. But it seems too early to say that HadoopDB will ever get past the “research and oh by the way the code is open sourced” stage and become a real code line — whether commercialized, open source, or both.

The basic idea of HadoopDB is to put copies of a DBMS at different nodes of a grid, and use Hadoop to parcel work among them. Major benefits when compared with massively parallel DBMS are said to be:

Open/cheap/free
Query fault-tolerance
The related concept of tolerating node degradation that isn’t an outright node failure.

HadoopDB has actually been built with PostgreSQL. That version achieved performance well below that of a commercial DBMS “DBX”, where X=2. Column-store guru Abadi has repeatedly signaled his intention to try out HadoopDB with VectorWise at the nodes instead. (Recall that VectorWise is shared-everything.) It will be interesting to see how that configuration performs.

The real opportunity for HadoopDB, however, in my opinion may lie elsewhere. Read more

Categories: Analytic technologies, Columnar database management, Data models and architecture, Data types, Data warehousing, Database diversity, Hadoop, MapReduce, Open source, Parallelization, PostgreSQL, Scientific research, Structured documents, Theory and architecture

5 Comments

September 12, 2009

Introduction to the XLDB and SciDB projects

Before I write anything else about the overlapping efforts known as XLDB and SciDB, I probably should explain and disambiguate what they are as best I can. XLDB was organized and still is run by guys who want to solve a scientific problem in eXtremely Large DataBase Management, most especially Jacek Becla of SLAC (the organization previously known as Stanford Linear Accelerator Center). Becla’s original motivation was that he needs a DBMS to manage what will be 55 petabytes of raw image data and 100 petabytes of astronomical data total for LSST (Large Synoptic Survey Telescope). Read more

Categories: Data models and architecture, Database diversity, eBay, Michael Stonebraker, Open source, Petabyte-scale data management, Scientific research, Theory and architecture

2 Comments

September 10, 2009

What could or should make Oracle/MySQL antitrust concerns go away?

When the Oracle/MySQL deal was first announced, I wrote:

I can probably come up with business practices that could make things very hard on Oracle/MySQL competitors … but I haven’t found a compelling antitrust trigger on my first pass over the subject.

Subsequently, there’s been a lot of discussion about whether or not Oracle can use control of MySQL to make life difficult for third-party MySQL storage engine vendors.

Now that the European Commission is delaying the Oracle/Sun deal, explicitly because of Oracle/MySQL antitrust fears. That is, the European Commission wants to be reassured that an Oracle takeover of MySQL won’t unduly impinge upon the future availability of open source/low cost DBMS alternatives. This raises that natural question:

What could Oracle do to assure concerned parties that its ownership of MySQL won’t unduly hamper open-source-based DBMS competition?

I think that’s indeed the crucial question. The Oracle/Sun deal has enough momentum at this point that it both should and will be allowed to happen — perhaps with safeguards — rather than banned outright. If you have concerns about Oracle’s pending acquisition of MySQL, you should speak up and outline what kinds of regulatory safeguards would alleviate the problems you foresee.

More or less obvious possibilities include:

Divest MySQL. This is obviously an extreme measure, but it surely would work.
Provide some money and trademark rights to MySQL forkers. If MariaDB and Drizzle were put into strong competitive positions with MySQL today, it’s hard to argue how regulators could object to any future Oracle maneuverings Oracle might envision with the GPLed side of MySQL.
Offer a standard, attractive, long-term deal to MySQL bundlers. The commercial/non-GPL version of MySQL is a requirement for appliance vendors (surely), OEM vendors (probably), and storage engine vendors (maybe — I disagree, but I’m evidently in the minority).
Strengthen PostgreSQL. 🙂 Realistically, that’s not going to be part of any Oracle/MySQL resolution, so I’ll leave it as a subject for another time.

Categories: Mid-range, MySQL, Open source, Oracle, PostgreSQL

9 Comments

September 3, 2009

Teradata really means that those 100+ appliances are in PRODUCTION

I was misremembering. It turns out that when Teradata said it had over 100 appliances “in production”, it meant that >100 hardware-based appliances are actually in production. If you add in the software-only “appliances,” and count test/development as well as true production, the total rises to >200.

I tried to get a finer breakdown out of Teradata on a disclosable basis, but failed. The ostensible reason is that public companies often don’t do that sort of thing without permission from the investor relations department, and Teradata’s marketers evidently haven’t felt a sense of urgency about getting permission to, for example, communicate how well just the 25xx series is doing.

Categories: Data warehouse appliances, Data warehousing, Market share and customer counts, Teradata

1 Comment

September 3, 2009

Continuent on clustering

Robert Hodges, CTO of my client Continuent, put up a blog post laying out his and Continuent’s views on database clustering. Continuent offers Tungsten, its third try at database clustering technology, targeted at MySQL, PostgreSQL, and perhaps Oracle. Unlike Continuent’s more ambitious. second-generation product, Tungsten offers single-master replication, which in Robert’s view allows for great ease of deployment and administration (he likes the phrase “bone-simple”).

The downside to Continuent Tungsten ‘s stripped down architecture is that it doesn’t solve the most extreme performance scale-out problems. Instead, Continuent focuses on the other big benefits of keeping your data in more than one place, namely high availability and data loss prevention (i.e., backup).

Continuent has been around for a number of years, starting out in Finland but now being based in Silicon Valley. For most purposes, however, it’s reasonable to think of Continuent and Tungsten as start-up efforts.

As you might guess from the references to Finland and MySQL, Continuent’s products are open source, or at least have open source versions. I’m still a little fuzzy as to which features are open sourced and which are not. For that matter, I’m still unclear as to Tungsten’s feature list overall …

Categories: Clustering, Continuent, MySQL, Open source, PostgreSQL

3 Comments

September 3, 2009

SAS on Netezza and other Netezza extensibility

I chatted with SAS CTO Keith Collins yesterday about the new SAS/Netezza in-database parallel data mining scoring offering. My impression is that this is very similar to SAS’ current Teradata support, notwithstanding SAS’ and Teradata’s apparent original intention of offering in-database modeling by now as well.

I gather this is a big performance-enhancing deal, just as it is for SPSS or Oracle’s own data mining over Oracle. However, I must confess to not yet understanding why. That is, I don’t know what’s so complicated about data mining scoring algorithms that makes hand-coding them in SQL particularly forbidding. My naive view of data mining is that you do a big regression to get a bunch of weights, and the resulting scoring algorithm is a linear combination of a few dozen variables. Evidently, that’s not quite right.

Anyhow, it turns out that SAS held off on this work until it could be done for TwinFin. That’s largely because TwinFin lets partners write code on Intel CPUs, while previously they had to write in C for Netezza’s FPGAs. I got a similar sense from at least one other Netezza partner as well.

Categories: Data warehouse appliances, Data warehousing, Netezza, Predictive modeling and advanced analytics, SAS Institute

5 Comments

← Previous Page — Next Page →

Search our blogs and white papers

Monash Research blogs

DBMS 2 covers database management, analytics, and related technologies.
Text Technologies covers text mining, search, and social software.
Strategic Messaging analyzes marketing and messaging strategy.
The Monash Report examines technology and public policy issues.
Software Memories recounts the history of the software industry.

User consulting

Building a short list? Refining your strategic plan? We can help.

Vendor advisory

We tell vendors what's happening -- and, more important, what they should do about it.

Monash Research highlights

Learn about white papers, webcasts, and blog highlights, by RSS or email.

Links
- Monash Research
- White Papers
Admin
- Log in

DBMS product categories

What Nielsen really uses in data warehousing DBMS

Thoughts on the integration of OLTP and data warehousing, especially in Exadata 2

The hunt for Oracle Exadata production references

Notes on the Oracle Database 11g Release 2 white paper

HadoopDB

Introduction to the XLDB and SciDB projects

What could or should make Oracle/MySQL antitrust concerns go away?

Teradata really means that those 100+ appliances are in PRODUCTION

Continuent on clustering

SAS on Netezza and other Netezza extensibility

Search our blogs and white papers

Monash Research blogs

User consulting

Vendor advisory

Monash Research highlights

Recent posts

Categories

Date archives

Admin