October 18, 2010

More notes on Membase and memcached

As a companion to my post about Membase last week, the company has graciously allowed me to post a rather detailed Membase slide deck. (It even has pricing.) Also, I left one point out.

Membase announced a Cloudera partnership. I couldn’t detect anything technically exciting about that, but it serves to highlight what I do find to be an interesting usage trend. A couple of big Web players (AOL and ShareThis) are using Hadoop to crunch data and derive customer profile data, then feed that back into Membase. Why Membase? Because it can serve up the profile in a millisecond, as part of a bigger 40-millisecond-latency request.

And why Hadoop, rather than Aster Data nCluster, which ShareThis also uses? Umm, I didn’t ask.

When I mentioned this to Colin Mahony, he said Vertica had similar stories. However, I don’t recall whether they were about Membase or just memcached, and he hasn’t had a chance to get back to me with clarification. (Edit: As per Colin’s comment below, it’s both.)

Categories: Aster Data, Cache, Cloudera, Couchbase, Hadoop, memcached, Memory-centric data management, NoSQL, Pricing, Specific users, Vertica Systems, Web analytics

7 Comments

October 6, 2010

eBay followup — Greenplum out, Teradata > 10 petabytes, Hadoop has some value, and more

I chatted with Oliver Ratzesberger of eBay around a Stanford picnic table yesterday (the XLDB 4 conference is being held at Jacek Becla’s home base of SLAC, which used to stand for “Stanford Linear Accelerator Center”). Todd Walter of Teradata also sat in on the latter part of the conversation. Things I learned included: Read more

Categories: Data warehousing, Derived data, eBay, Greenplum, Hadoop, HBase, Log analysis, Petabyte-scale data management, Teradata

30 Comments

September 24, 2010

A little more on the JPMorgan Chase Oracle outage

Jaikumar Vijayan of Computerworld did a story based on my reporting on the JP Morgan Chase Oracle outage. He did a good job, getting me to simplify some of what I said before. 🙂 He also added a quote from Chase to the effect:

the “long recovery process” was caused by a corruption of systems data that disabled the bank’s “ability to process customer log-ins to chase.com”

While that’s true, and indeed is the reason I first referred to this as an “authentication” problem, I believe it to be incomplete. For example, the $132 million in missed ACH payments weren’t directly driven by log-ins; they were to be done on schedule, perhaps based on previous log-ins. Or as Jai and I put it in the guts of his story: Read more

Categories: JPMorgan Chase, Oracle

6 Comments

September 17, 2010

Details of the JPMorgan Chase Oracle database outage

After posting my speculation about the JPMorgan Chase database outage, I was contacted by – well, by somebody who wants to be referred to as “a credible source close to the situation.” We chatted for a long time; I think it is very likely that this person is indeed what s/he claims to be; and I am honoring his/her requests to obfuscate many identifying details. However, I need a shorter phrase than “a credible source close to the situation,” so I’ll refer to him/her as “Deep Packet.”

According to Deep Packet,

The JPMorgan Chase database outage was caused by corruption in an Oracle database.
This Oracle database stored user profiles, which are more than just authentication data.
Applications that went down include but may not be limited to:
- The main JPMorgan Chase portal.
- JPMorgan Chase’s ability to use the ACH (Automated Clearing House).
- Loan applications.
- Private client trading portfolio access.
The Oracle database was back up by 1:12 Wednesday morning. But on Wednesday a second problem occurred, namely an overwhelming number of web requests. This turned out to be a cascade of retries in the face of – and of course exacerbating – poor response time. While there was no direct connection to the database outage, Deep Packet is sympathetic to my suggestions that:
- Network/app server traffic was bound to be particularly high as people tried to get caught up after the Tuesday outage, or just see what was going on in their accounts.
- Given that Deep Packet said there was a definite operator-error contributing cause, perhaps the error would not have happened if people weren’t so exhausted from dealing with the database outage.

Deep Packet stressed the opinion that the Oracle outage was not the fault of JPMorgan Chase (the Wednesday slowdown is a different matter), and rather can be blamed on an Oracle bug. Read more

Categories: JPMorgan Chase, OLTP, Oracle

45 Comments

September 16, 2010

Speculation about the JPMorgan Chase authentication database outage

Edit: Subsequent to making this post, I obtained more detail about the JPMorgan Chase database outage.

I was just contacted for comment about the Chase database outage, about which they’ve released remarkably little information (they’ve even apologized for their terseness). About all Chase has said is:

A third-party database company’s software caused a corruption of systems information, disabling our ability to process customer log-ins to chase.com. This resulted in a long recovery process,

and even that quote is a bit hard to find. From other reporting, we know that ATM machines, bank branches, and the call centers continued to work, but various web and mobile access applications were disabled.

Of course, that quote is pretty ambiguous. My thoughts on it include: Read more

Categories: Data types, JPMorgan Chase

11 Comments

August 26, 2010

More on NoSQL and HVSP (or OLRP)

Since posting last Wednesday morning that I’m looking into NoSQL and HVSP, I’ve had a lot of conversations, including with (among others):

Dwight Merriman of 10gen (MongoDB)
Damien Katz of Couchio (CouchDB)
Matt Pfeil of Riptano (Cassandra)
Todd Lipcon of Cloudera (HBase committer)
Tony Falco of Basho (Riak)
John Busch of Schooner
Ori Herrnstadt of Akiban

Categories: Akiban, Basho and Riak, Cache, Cassandra, Cloudera, Clustrix, CouchDB, DataStax, Facebook, Hadoop, HBase, memcached, MySQL, NewSQL, NoSQL, Object, OLTP, Open source, Parallelization, Schooner Information Technology, Theory and architecture, Tokutek and TokuDB

3 Comments

August 22, 2010

Workday comments on its database architecture

In my discussion of Workday’s technology, I gave an estimate that Workday’s database, if relationally designed, would require “1000s” of tables. That estimate came from Workday, Inc. CTO Stan Swete, in a thoughtful email that made several points about Workday’s database strategy. Workday kindly gave me permission to quote it below.
Read more

Categories: Data models and architecture, Object, OLTP, Software as a Service (SaaS), Specific users, Theory and architecture, Workday

3 Comments

August 22, 2010

The Workday architecture — a new kind of OLTP software stack

One of my coolest company visits in some time was to SaaS (Software as a Service) vendor Workday, Inc., earlier this month. Reasons included:

Workday has forward-thinking ideas about SaaS enterprise applications and the integration of business intelligence into same.
Workday has highly innovative ideas in how it manages data.
Companies founded by Dave Duffield tend to feature smart, likeable people who talk to one pleasantly and forthrightly. Workday is no exception; CTO Stan Swete and the other Workday folks present were a delight to talk with.
I’d invited Merv Adrian to come along with me. He asked great questions, and I could gather myself a bit despite how sleep-deprived I was for the first part of that trip.

Workday kindly allowed me to post this Workday slide deck. Otherwise, I’ve split out a quick Workday, Inc. company overview into a separate post.

The biggie for me was the data and object management part. Specifically: Read more

Categories: Business intelligence, Data integration and middleware, Data models and architecture, EAI, EII, ETL, ELT, ETLT, NoSQL, Object, OLTP, Software as a Service (SaaS), Specific users, Theory and architecture, Workday

13 Comments

August 18, 2010

I’m collecting data points on NoSQL and HVSP adoption

I was asked to do a magazine article on NoSQL, where by “NoSQL” is meant “whatever they talk about at NoSQL conferences.” By now the number of publications planning to run the article is up to 2, the deadline is next week and, crucially, it has been agreed that I may talk about HVSP in general, NoSQL and SQL alike.

It also is understood that, realistically, I can’t be expected to know and mention the very latest news for all the many products in the categories. Even so, I think this would be fine time to check just where NoSQL and HVSP adoption stand. Here is most of what I know, or links to same; it would be great if you guys would contribute additional data in the comment thread.

In the NoSQL area: Read more