Telecommunications

Posts about database and analytic technologies applied to the telecommunications industry, especially in call detail record (CDR) applications. Related subjects include:

February 1, 2011

Cassandra company DataStax (formerly Riptano) is on track

Riptano, the Cassandra company, has changed its name to DataStax. DataStax has opened headquarters in Burlingame and hired some database-experienced folks – notably Ben Werther from Greenplum and Michael Weir from ParAccel, with Zenobia Godschalk (who worked with Aster Data) somewhere in the outside PR mix. Other than that, what’s new at DataStax is pretty much what could have been expected based on what DataStax folks said last spring.

Most notably, DataStax is introducing a software offering, whose full name is DataStax OpsCenter for Apache Cassandra. DataStax OpsCenter for Apache Cassandra seems to be, in essence, a monitoring tool for Cassandra clusters, with a bit of capacity planning bundled in. (If there are any outright operations parts to DataStax OpsCenter, they got overlooked in our conversation.)* Read more

Categories: Cassandra, DataStax, Market share and customer counts, NoSQL, Specific users, Telecommunications

1 Comment

January 11, 2011

The technology of privacy threats

This post is the second of a series. The first one was an overview of privacy dangers, replete with specific examples of kinds of data that are stored for good reasons, but can also be repurposed for more questionable uses. More on this subject may be found in my August, 2010 post Big Data is Watching You!

There are two technology trends driving electronic privacy threats. Taken together, these trends raise scenarios such as the following:

Your web surfing behavior indicates you’re a sports car buff, and you further like to look at pictures of scantily-clad young women. A number of your Facebook friends are single women. As a result, you’re deemed a risk to have a mid-life crisis and divorce your wife, thus increasing the interest rate you have to pay when refinancing your house.
Your cell phone GPS indicates that you drive everywhere, instead of walking. There is no evidence of you pursuing fitness activities, but forum posting activity suggests you’re highly interested in several TV series. Your credit card bills show that your taste in restaurant food tends to the fatty. Your online photos make you look fairly obese, and a couple have ashtrays in them. As a result, you’re judged a high risk of heart attack, and your medical insurance rates are jacked up accordingly.
You did actually have that mid-life crisis and get divorced. At the child-custody hearing, your ex-spouse’s lawyer quotes a study showing that football-loving upper income Republicans are 27% more likely to beat their children than yoga-class-attending moderate Democrats, and the probability goes up another 8% if they ever bought a jersey featuring a defensive lineman. What’s more, several of the more influential people in your network of friends also fit angry-male patterns, taking the probability of abuse up another 13%. Because of the sound statistics behind such analyses, the judge listens.

Not all these stories are quite possible today, but they aren’t far off either.

Categories: Facebook, Predictive modeling and advanced analytics, Surveillance and privacy, Telecommunications, Web analytics

4 Comments

January 10, 2011

Privacy dangers — an overview

This post is the first of a series. The second one delves into the technology behind the most serious electronic privacy threats.

The privacy discussion has gotten more active, and more complicated as well. A year ago, I still struggled to get people to pay attention to privacy concerns at all, at least in the United States, with my first public breakthrough coming at the end of January. But much has changed since then.

On the commercial side, Facebook modified its privacy policies, garnering great press attention and an intense user backlash, leading to a quick partial retreat. The Wall Street Journal then launched a long series of articles — 13 so far — recounting multiple kinds of privacy threats. Other media joined in, from Forbes to CNet. Various forms of US government rule-making to inhibit advertising-related tracking have been proposed as an apparent result.

In the US, the government had a lively year as well. The Transportation Security Administration (TSA) rolled out what have been dubbed “porn scanners,” and backed them up with “enhanced patdowns.” For somebody who is, for example, female, young, a sex abuse survivor, and/or a follower of certain religions, those can be highly unpleasant, if not traumatic. Meanwhile, the Wikileaks/Cablegate events have spawned a government reaction whose scope is only beginning to be seen. A couple of “highlights” so far are some very nasty laptop seizures, and the recent demand for information on over 600,000 Twitter accounts. (Christopher Soghoian provided a detailed, nuanced legal analysis of same.)

At this point, it’s fair to say there are at least six different kinds of legitimate privacy fear. Read more

Categories: Analytic technologies, Facebook, GIS and geospatial, Health care, Surveillance and privacy, Telecommunications, Web analytics

6 Comments

October 19, 2010

Introduction to Kaminario

At its core, the Kaminario story is simple:

Throw out your disks and replace them with, not Flash, but actual DRAM.
Your IOPS (Input/Output Per Second) are so high* that you get the performance you need without any further system changes.
The whole thing is very fast to set up.

In other words, Kaminario pitches a value proposition something like (my words, not theirs) “A shortcut around your performance bottlenecks.”

*1 million or so on the smallest Kaminario K2 appliance.

Kaminario asserts that both analytics and OLTP (OnLine Transaction Processing) are represented in its user base. Even so, the use cases Kaminario mentioned seemed to be concentrated on the analytic side. I suspect there are two main reasons:

As Kaminario points out, OLTP apps commonly are designed to perform in the face of regrettable I/O wait.
Also, analytic performance problems tend to arise more suddenly than OLTP ones do.*

*Somebody can think up a new analytic query overnight that takes 10 times the processing of anything they’ve ever run before. Or they can get the urge to run the same queries 10 times as often as before. Both those kinds of thing happen less often in the OLTP world.

Accordingly, Kaminario likes to sell against the alternative of getting a better analytic DBMS, stressing that you can get a Kaminario K2 appliance into production a lot faster than you can move your processing to even the simplest data warehouse appliance. Kaminario is probably technically correct in saying that; even so, I suspect it would often make more sense to view Kaminario K2 appliances as a transition technology, by which I mean:

You have an annoying performance problem.
Kaminario K2 could solve it very quickly.
That buys you time for a more substantive fix.*
If you want, you can redeploy your Kaminario K2 storage to solve your next-worst performance bottleneck.

On that basis, I could see Kaminario-like devices eventually getting to the point that every sufficiently large enterprise should have some of them, whether or not that enterprise has an application it believes should run permanently against DRAM block storage. Read more

Categories: Investment research and trading, Kaminario, Solid-state memory, Storage, Telecommunications, Web analytics

7 Comments

August 11, 2010

Big Data is Watching You!

There’s a boom in large-scale analytics. The subjects of this analysis may be categorized as:

People
Financial trades
Electronic networks
Everything else

The most varied, interesting, and valuable of those four categories is the first one.

Categories: Aster Data, Data warehousing, Investment research and trading, Log analysis, MapReduce, Predictive modeling and advanced analytics, RDF and graphs, Specific users, Surveillance and privacy, Telecommunications, Web analytics

6 Comments

May 25, 2010

VoltDB finally launches

VoltDB is finally launching today. As is common for companies in sectors I write about, VoltDB — or just “Volt” — has discovered the virtues of embargoes that end 12:01 am. Let’s go straight to the technical highlights:

VoltDB is based on the H-Store technology, which I wrote about in February, 2009. Most of what I said about H-Store then applies to VoltDB today.
VoltDB is a no-apologies ACID relational DBMS, which runs entirely in RAM.
VoltDB has rather limited SQL. (One example: VoltDB can’t do SUMs in SQL.) However, VoltDB guy Tim Callaghan (Mark Callaghan’s lesser-known but nonetheless smart brother) asserts that if you code up the missing functionality, it’s almost as fast as if it were present in the DBMS to begin with, because there’s no added I/O from the handoff between the DBMS and the procedural code. (The data’s in RAM one way or the other.)
VoltDB’s Big Conceptual Performance Story is that it does away with most locks, latches, logs, etc., and also most context switching.
In particular, you’re supposed to partition your data and architect your application so that most transactions execute on a single core. When you can do that, you get VoltDB’s performance benefits. To the extent you can’t, you’re in two-phase-commit performance land. (More precisely, you’re doing 2PC for multi-core writes, which is surely a major reason that multi-core reads are a lot faster in VoltDB than multi-core writes.)
VoltDB has a little less than one DBMS thread per core. When the data partitioning works as it should, you execute a complete transaction in that single thread. Poof. No context switching.
A transaction in VoltDB is a Java stored procedure. (The early idea of Ruby on Rails in lieu of the Java/SQL combo didn’t hold up performance-wise.)
Solid-state memory is not a viable alternative to RAM for VoltDB. Too slow.
Instead, VoltDB lets you snapshot data to disk at tunable intervals. “Continuous” is one of the options, wherein a new snapshot starts being made as soon as the last one completes.
In addition, VoltDB will also spool a kind of transaction log to the target of your choice. (Obvious choice: An analytic DBMS such as Vertica, but there’s no such connectivity partnership actually in place at this time.)

Categories: EAI, EII, ETL, ELT, ETLT, Games and virtual worlds, In-memory DBMS, Investment research and trading, Michael Stonebraker, NoSQL, OLTP, Parallelization, Solid-state memory, Telecommunications, Theory and architecture, VoltDB and H-Store

16 Comments

May 23, 2010

More on Sybase IQ, including Version 15.2

Back in March, Sybase was kind enough to give me permission to post a slide deck about Sybase IQ. Well, I’m finally getting around to doing so. Highlights include but are not limited to:

Slide 2 has some market success figures and so on. (>3100 copies at >1800 users, >200 sales last year)
Slides 6-11 give more detail on Sybase’s indexing and data access methods than I put into my recent technical basics of Sybase IQ post.
Slide 16 reminds us that in-database data mining is quite competitive with what SAS has actually delivered with its DBMS partners, even if it doesn’t have the nice architectural approach of Aster or Netezza. (I.e., Sybase IQ’s more-than-SQL advanced analytics story relies on C++ UDFs — User Defined Functions — running in-process with the DBMS.) In particular, there’s a data mining/predictive analytics library — modeling and scoring both — licensed from a small third party.
A number of the other later slides also have quite a bit of technical crunch. (More on some of those points below too.)

Sybase IQ may have a bit of a funky architecture (e.g., no MPP), but the age of the product and the substantial revenue it generates have allowed Sybase to put in a bunch of product features that newer vendors haven’t gotten around to yet.

More recently, Sybase volunteered permission for me to preannounce Sybase IQ Version 15.2 by a few days (it’s scheduled to come out this week). Read more

Categories: Analytic technologies, Application areas, Columnar database management, Data mart outsourcing, Data warehousing, Database compression, Investment research and trading, Market share and customer counts, Petabyte-scale data management, Sybase, Telecommunications, Text

1 Comment

April 12, 2010

Greenplum Chorus and Greenplum 4.0

Greenplum is making two product announcements this morning. Greenplum 4.0 is a revision of the core Greenplum database technology. In addition, Greenplum is announcing Greenplum Chorus, which is the first product release instantiating last year’s EDC (Enterprise Data Cloud) vision statement and marketing campaign.

Greenplum 4.0 highlights and related observations include: Read more

Categories: Analytic technologies, Benchmarks and POCs, Data integration and middleware, Data warehousing, EAI, EII, ETL, ELT, ETLT, Greenplum, Market share and customer counts, Petabyte-scale data management, Specific users, Telecommunications, Theory and architecture

5 Comments

April 8, 2010

Examples of machine-generated data

Not long ago I pointed out that much future Big Data growth will be in the area of machine-generated data, examples of which include: Read more

Categories: Analytic technologies, Data warehousing, Games and virtual worlds, Investment research and trading, Log analysis, Oracle, Telecommunications, Web analytics

28 Comments

January 17, 2010

Three broad categories of data

People often try to draw a distinction between:

Traditional data of the sort that’s stored in relational databases, aka “structured.”
Everything else, aka “unstructured” or “semi-structured” or “complex.”

There are plenty of problems with these formulations, not the least of which is that the supposedly “unstructured” data is the kind that actually tends to have interesting internal structures. But of the many reasons why these distinctions don’t tend to work very well, I think the most important one is that:

Databases shouldn’t be divided into just two categories. Even as a rough-cut approximation, they should be divided into three, namely:

Human/Tabular data –i.e., human-generated data that fits well into relational tables or arrays
Human/Nontabular data — i.e., all other data generated by humans
Machine-Generated data

Even that trichotomy is grossly oversimplified, for reasons such as:

These categories overlap.
There are kinds of data that get into fuzzy border zones.
Not all data in each category has all the same properties.

But at least as a starting point, I think this basic categorization has some value. Read more

Categories: Database diversity, Investment research and trading, Log analysis, Telecommunications, Web analytics

19 Comments

← Previous Page — Next Page →

Search our blogs and white papers

Monash Research blogs

DBMS 2 covers database management, analytics, and related technologies.
Text Technologies covers text mining, search, and social software.
Strategic Messaging analyzes marketing and messaging strategy.
The Monash Report examines technology and public policy issues.
Software Memories recounts the history of the software industry.

User consulting

Building a short list? Refining your strategic plan? We can help.

Vendor advisory

We tell vendors what's happening -- and, more important, what they should do about it.

Monash Research highlights

Learn about white papers, webcasts, and blog highlights, by RSS or email.

Links
- Monash Research
- White Papers
Admin
- Log in