A little more on the JPMorgan Chase Oracle outage
Jaikumar Vijayan of Computerworld did a story based on my reporting on the JP Morgan Chase Oracle outage. He did a good job, getting me to simplify some of what I said before. 🙂 He also added a quote from Chase to the effect:
the “long recovery process” was caused by a corruption of systems data that disabled the bank’s “ability to process customer log-ins to chase.com”
While that’s true, and indeed is the reason I first referred to this as an “authentication” problem, I believe it to be incomplete. For example, the $132 million in missed ACH payments weren’t directly driven by log-ins; they were to be done on schedule, perhaps based on previous log-ins. Or as Jai and I put it in the guts of his story:
“Not everything in the user profile database needed to be added via ACID transactions,” he said. It’s likely that even if some of the Web usage data had been lost, it would not have impinged on the integrity of the bank’s financial dealings, he said. “At a minimum recovery would have been much shorter had the data not been there,” he said.
The fact that problems with a single database affected the main Web portal, its Automated Clearing House functions and loan applications suggests that the product was a single point of failure for too many applications, Monash said.
“This was a large and complex database that when it went bad brought down many applications,” he said. It’s not clear if the benefits of tying so many applications to a single database exceeded the risks in this case, he added.
Comments
6 Responses to “A little more on the JPMorgan Chase Oracle outage”
Leave a Reply
(Sorry if I already posted this.) Yes, you don’t need ACID on the profile for normal situations. It just does not matter if something in the profile changes while you’re doing a transaction, if the transactions are short.
I am confused. Was it ACID that caused the database meltdown?
If not, then what is the point of bringing it in?
To “prove” that ACID is “bad”?
What a joke…
It was ACID that caused the database size and update volume. It was database size and update volume that drove the duration of the recovery.
Whether proper segmentation of the data could have altogether prevented the ACH and loans parts of the outage is less clear.
ACID is not a database fault nor is it a cause of database size and update, that is totally incorrect!
What caused the problem was a database software fault.
Can we please stop the newspeak of calling ACID a software fault?
I don’t care which applecart is being pushed here, but at least please remain technically correct! Enough with the mis-information!
Noons,
While most people seem to have understood what I wrote, for some reason I seem to have confused you pretty badly. I’m sorry. Let’s try one more time.
Is that clearer now?
[…] up an ACID-compliant database. But there’s also been considerable support, e.g. from Dan Weinreb, who knows quite a lot about huge OLTP […]