Application areas

Posts focusing on the use of database and analytic technologies in specific application domains. Related subjects include:

Any subcategory
(in Text Technologies) Specific application areas for text analytics

October 10, 2010

Notes and links October 10 2010

More quick-hit notes, links, and so on: Read more

Categories: Analytic technologies, Aster Data, Data warehousing, Greenplum, Health care, Surveillance and privacy, XtremeData

eBay followup — Greenplum out, Teradata > 10 petabytes, Hadoop has some value, and more

I chatted with Oliver Ratzesberger of eBay around a Stanford picnic table yesterday (the XLDB 4 conference is being held at Jacek Becla’s home base of SLAC, which used to stand for “Stanford Linear Accelerator Center”). Todd Walter of Teradata also sat in on the latter part of the conversation. Things I learned included: Read more

Categories: Data warehousing, Derived data, eBay, Greenplum, Hadoop, HBase, Log analysis, Petabyte-scale data management, Teradata

30 Comments

October 3, 2010

Notes and links October 3 2010

Some notes, follow-up, and links before I head out to California: Read more

Categories: GIS and geospatial, Google, HP and Neoview, Humor, Kickfire, Netezza, Solid-state memory, Teradata, Web analytics

3 Comments

September 27, 2010

A rant about medical records

It is very difficult to convey utterly tedious frustration without — well, without thoroughly boring one’s audience. And hence I will not try to explain the full awfulness of modern medical records and information compartmentalization. But I was personally present 5 times in one recent week while Linda gave detailed information about her contact information, medical history, etc. — and all 5 times it was to the same hospital.

In our case, that just costs time. But the information flow in my father’s case upsets me more. Read more

Categories: Health care, Surveillance and privacy

2 Comments

September 21, 2010

How to tell whether you need ACID-compliant transaction integrity

In a post about the recent JPMorgan Chase database outage, I suggested that JPMorgan Chase’s user profile database was over-engineered, in that various web surfing data was stored in a fully ACID-compliant manner when it didn’t really need to be. I’ve since gotten private communication expressing vehement agreement, and telling of the opposite choice being major in other major web-facing transactional systems.

What’s going on is this:

ACID-compliant transaction integrity commonly costs more in terms of DBMS licenses and many other components of TCO (Total Cost of Ownership) than less rigorous approaches.
Worse, it can actually hurt application uptime, by forcing your system to pull in its horns and stop functioning in the face of failures that a non-transactional system might smoothly work around.
Other flavors of “complexity can be a bad thing” apply as well.

Thus, transaction integrity can be more trouble than it’s worth.

In essence, of course, that’s half of the classic NoSQL claim, where the other half of the claim is to assert that the same may be said of joins.

So when should you go for ACID-compliant transaction integrity, and when shouldn’t you bother? Every situation is different, but here’s a set of considerations to start you off. Read more

Categories: NoSQL, Web analytics

12 Comments

September 13, 2010

Reconciling medical privacy and elder care

In a previous post, I outlined how Friendship Village of Dublin has mishandled my father’s medical information, to the detriment of his medical care. Expanding on that story, here are some other complications or screw-ups in the same series of medical events. In these other cases, the blame clearly falls more on the information-flow system itself, rather than on some particular medical care provider such as Friendship Village of Dublin, Riverside Methodist Hospital, or the paramedics who transported my father from one to the other.

Categories: Health care, Surveillance and privacy

3 Comments

August 11, 2010

Big Data is Watching You!

There’s a boom in large-scale analytics. The subjects of this analysis may be categorized as:

People
Financial trades
Electronic networks
Everything else

The most varied, interesting, and valuable of those four categories is the first one.

Categories: Aster Data, Data warehousing, Investment research and trading, Log analysis, MapReduce, Predictive modeling and advanced analytics, RDF and graphs, Specific users, Surveillance and privacy, Telecommunications, Web analytics

6 Comments

July 31, 2010

Nested data structures keep coming up, especially for log files

Nested data structures have come up several times now, almost always in the context of log files.

Google has published about a project called Dremel. Per Tasso Agyros, one of Dremel’s key concepts is nested data structures.
Those arrays that the XLDB/SciDB folks keep talking about are meant to be nested data structures. Scientific data is of course log-oriented. eBay was very interested in that project too.
Facebook’s log files have a big nested data structure flavor.

I don’t have a grasp yet on what exactly is happening here, but it’s something.

Categories: eBay, Facebook, Google, Log analysis, Scientific research, Theory and architecture

7 Comments

July 6, 2010

Cassandra technical overview

Back in March, I talked with Jonathan Ellis of Rackspace, who runs the Apache Cassandra project. I started drafting a blog post then, but never put it up. Then Jonathan cofounded Riptano, a company to commercialize Cassandra, and so I talked with him again in May. Well, I’m finally finding time to clear my Cassandra/Riptano backlog. I’ll cover the more technical parts below, and the more business- or usage-oriented ones in a companion Cassandra/Riptano post.

Jonathan’s core claims for Cassandra include:

Cassandra is shared-nothing.
Cassandra has good approaches to replication and partitioning, right out of the box.
In particular, Cassandra is good for use cases that distribute a database around the world and want to access it at “local” latencies. (Indeed, Jonathan asserts that non-local replication is a significant non-big-data Cassandra use case.)
Cassandra’s scale-out is application-transparent, unlike sharded MySQL’s.
Cassandra is fast at both appends and range queries, which would be hard to accomplish in a pure key-value store.

In general, Jonathan positions Cassandra as being best-suited to handle a small number of operations at high volume, throughput, and speed. The rest of what you do, as far as he’s concerned, may well belong in a more traditional SQL DBMS. Read more

Categories: Amazon and its cloud, Cassandra, DataStax, Facebook, Google, Log analysis, NoSQL, Open source, Parallelization

4 Comments

July 1, 2010

Why you should go to XLDB4

Scientific data commonly:

Comes in large volumes
Is machine-generated
Is augmented by synthetic and/or derived data
Has a spatial and/or temporal structure

In those respects, it is akin to some of the hottest areas for big data analytics, including:

Investment trade data – big, partly machine generated, augmented (often), temporal
Web/network log data – big, machine-generated, post-processed into derived form, temporal
Marketing analytic data – big, post-processed into derived form
Genomic data

So when Jacek Becla started the XLDB conferences on the premise that scientific and big data analytic challenges have a lot in common, he had a point. There are several tough database problems that the science-focused folks have taken the leading in thinking about, but which are soon going to matter to the commercial world as well. And that’s one of two big reasons why you should consider participating in XLDB4, October 6-7, at the SLAC facility in Menlo Park, CA, as an attendee, sponsor, or both.

The other big reason is that it is important for the world that XLDB succeed. Read more

Categories: Investment research and trading, Log analysis, Scientific research, Web analytics

2 Comments

← Previous Page — Next Page →

Search our blogs and white papers

Monash Research blogs

DBMS 2 covers database management, analytics, and related technologies.
Text Technologies covers text mining, search, and social software.
Strategic Messaging analyzes marketing and messaging strategy.
The Monash Report examines technology and public policy issues.
Software Memories recounts the history of the software industry.

User consulting

Building a short list? Refining your strategic plan? We can help.

Vendor advisory

We tell vendors what's happening -- and, more important, what they should do about it.

Monash Research highlights

Learn about white papers, webcasts, and blog highlights, by RSS or email.

Links
- Monash Research
- White Papers
Admin
- Log in