Application areas

Posts focusing on the use of database and analytic technologies in specific application domains. Related subjects include:

August 17, 2013

Aerospike 3

My clients at Aerospike are coming out with their Version 3, and as several of my clients do, have encouraged me to front-run what otherwise would be the Monday embargo.

I encourage such behavior with arguments including:

Aerospike 2’s value proposition, let us recall, was:

… performance, consistent performance, and uninterrupted operations …

  • Aerospike’s consistent performance claims are along the lines of sub-millisecond latency, with 99.9% of responses being within 5 milliseconds, and even a node outage only borking performance for some 10s of milliseconds.
  • Uninterrupted operation is a core Aerospike design goal, and the company says that to date, no Aerospike production cluster has ever gone down.

The major support for such claims is Aerospike’s success in selling to the digital advertising market, which is probably second only to high-frequency trading in its low-latency demands. For example, Aerospike’s CMO Monica Pal sent along a link to what apparently is:

Read more

August 4, 2013

Data model churn

Perhaps we should remind ourselves of the many ways data models can be caused to churn. Here are some examples that are top-of-mind for me. They do overlap a lot — and the whole discussion overlaps with my post about schema complexity last January, and more generally with what I’ve written about dynamic schemas for the past several years..

Just to confuse things further — some of these examples show the importance of RDBMS, while others highlight the relational model’s limitations.

The old standbys

Product and service changes. Simple changes to your product line many not require any changes to the databases recording their production and sale. More complex product changes, however, probably will.

A big help in MCI’s rise in the 1980s was its new Friends and Family service offering. AT&T couldn’t respond quickly, because it couldn’t get the programming done, where by “programming” I mainly mean database integration and design. If all that was before your time, this link seems like a fairly contemporaneous case study.

Organizational changes. A common source of hassle, especially around databases that support business intelligence or planning/budgeting, is organizational change. Kalido’s whole business was based on accommodating that, last I checked, as were a lot of BI consultants’. Read more

July 20, 2013

The refactoring of everything

I’ll start with three observations:

As written, that’s probably pretty obvious. Even so, it’s easy to forget just how pervasive the refactoring is and is likely to be. Let’s survey some examples first, and then speculate about consequences. Read more

July 12, 2013

More notes on predictive modeling

My July 2 comments on predictive modeling were far from my best work. Let’s try again.

1. Predictive analytics has two very different aspects.

Developing models, aka “modeling”:

More precisely, some modeling algorithms are straightforward to parallelize and/or integrate into RDBMS, but many are not.

Using models, most commonly:

2. Some people think that all a modeler needs are a few basic algorithms. (That’s why, for example, analytic RDBMS vendors are proud of integrating a few specific modeling routines.) Other people think that’s ridiculous. Depending on use case, either group can be right.

3. If adoption of DBMS-integrated modeling is high, I haven’t noticed.

Read more

June 13, 2013

How is the surveillance data used?

Over the past week, discussion has exploded about US government surveillance. After summarizing, as best I could, what data the government appears to collect, now I ‘d like to consider what they actually do with it. More precisely, I’d like to focus on the data’s use(s) in combating US-soil terrorism. In a nutshell:

Consider the example of Tamerlan Tsarnaev:

In response to this 2011 request, the FBI checked U.S. government databases and other information to look for such things as derogatory telephone communications, possible use of online sites associated with the promotion of radical activity, associations with other persons of interest, travel history and plans, and education history.

While that response was unsuccessful in preventing a dramatic act of terrorism, at least they tried.

As for actual success stories — well, that’s a bit tough. In general, there are few known examples of terrorist plots being disrupted by law enforcement in the United States, except for fake plots engineered to draw terrorist-leaning individuals into committing actual crimes. One of those examples, that of Najibullah Zazi, was indeed based on an intercepted email — but the email address itself was uncovered through more ordinary anti-terrorism efforts.

As for machine learning/data mining/predictive modeling, I’ve never seen much of a hint of it being used in anti-terrorism efforts, whether in the news or in my own discussions inside the tech industry. And I think there’s a great reason for that — what would they use for a training set? Here’s what I mean.  Read more

June 10, 2013

Where things stand in US government surveillance

Edit: Please see the comment thread below for updates. Please also see a follow-on post about how the surveillance data is actually used.

US government surveillance has exploded into public consciousness since last Thursday. With one major exception, the news has just confirmed what was already thought or known. So where do we stand?

My views about domestic data collection start:

*Recall that these comments are US-specific. Data retention legislation has been proposed or passed in multiple countries to require recording of, among other things, all URL requests, with the stated goal of fighting either digital piracy or child pornography.

As for foreign data: Read more

April 25, 2013

Analytic application themes

I talk with a lot of companies, and repeatedly hear some of the same application themes. This post is my attempt to collect some of those ideas in one place.

1. So far, the buzzword of the year is “real-time analytics”, generally with “operational” or “big data” included as well. I hear variants of that positioning from NewSQL vendors (e.g. MemSQL), NoSQL vendors (e.g. AeroSpike), BI stack vendors (e.g. Platfora), application-stack vendors (e.g. WibiData), log analysis vendors (led by Splunk), data management vendors (e.g. Cloudera), and of course the CEP industry.

Yeah, yeah, I know — not all the named companies are in exactly the right market category. But that’s hard to avoid.

Why this gold rush? On the demand side, there’s a real or imagined need for speed. On the supply side, I’d say:

2. More generally, most of the applications I hear about are analytic, or have a strong analytic aspect. The three biggest areas — and these overlap — are:

Also arising fairly frequently are:

I’m hearing less about quality, defect tracking, and equipment maintenance than I used to, but those application areas have anyway been ebbing and flowing for decades.

Read more

April 23, 2013

MemSQL scales out

The third of my three MySQL-oriented clients I alluded to yesterday is MemSQL. When I wrote about MemSQL last June, the product was an in-memory single-server MySQL workalike. Now scale-out has been added, with general availability today.

MemSQL’s flagship reference is Zynga, across 100s of servers. Beyond that, the company claims (to quote a late draft of the press release):

Enterprises are already using distributed MemSQL in production for operational analytics, network security, real-time recommendations, and risk management.

All four of those use cases fit MemSQL’s positioning in “real-time analytics”. Besides Zynga, MemSQL cites penetration into traditional low-latency markets — financial services (various subsectors) and ad-tech.

Highlights of MemSQL’s new distributed architecture start: Read more

April 1, 2013

Some notes on new-era data management, March 31, 2013

Hmm. I probably should have broken this out as three posts rather than one after all. Sorry about that.

Performance confusion

Discussions of DBMS performance are always odd, for starters because:

But in NoSQL/NewSQL short-request processing performance claims seem particularly confused. Reasons include but are not limited to:

MongoDB and 10gen

I caught up with Ron Avnur at 10gen. Technical highlights included: Read more

February 13, 2013

It’s hard to make data easy to analyze

It’s hard to make data easy to analyze. While everybody seems to realize this — a few marketeers perhaps aside — some remarks might be useful even so.

Many different technologies purport to make data easy, or easier, to an analyze; so many, in fact, that cataloguing them all is forbiddingly hard. Major claims, and some technologies that make them, include:

*Complex event/stream processing terminology is always problematic.

My thoughts on all this start:  Read more

← Previous PageNext Page →

Feed: DBMS (database management system), DW (data warehousing), BI (business intelligence), and analytics technology Subscribe to the Monash Research feed via RSS or email:

Login

Search our blogs and white papers

Monash Research blogs

User consulting

Building a short list? Refining your strategic plan? We can help.

Vendor advisory

We tell vendors what's happening -- and, more important, what they should do about it.

Monash Research highlights

Learn about white papers, webcasts, and blog highlights, by RSS or email.