Archiving and information preservation

Analysis of technologies related to database archiving and information preservation. Related subjects include:

July 15, 2012

Issues in regulatory compliance

From time to time, I hear of regulatory requirements to retain, analyze, and/or protect data in various ways. It’s hard to get a comprehensive picture of these, as they vary both by industry and jurisdiction; so I generally let such compliance issues slide. Still, perhaps I should use one post to pull together what is surely a very partial list.

Most such compliance requirements have one of two emphases: Either you need to keep your customers’ data safe against misuse, or else you’re supposed to supply information to government authorities. From a data management and analysis standpoint, the former area mainly boils down to:

Information security. This can include access control, encryption, masking, auditing, and more.
Keeping data in an approved geographical area. (E.g., its country of origin.) This seems to be one of the three big drivers for multi-data-center processing (along with latency and disaster recovery), and hence is an influence upon numerous users’ choices in areas such as clustering and replication.

The latter, however, has numerous aspects.

First, there are many purposes for the data retention and analysis, including but by no means limited to: Read more

Categories: Archiving and information preservation, Clustering, Data warehousing, Health care, Investment research and trading, Text

4 Comments

November 12, 2011

Clarifying SAND’s customer metrics, positioning and technical story

Talking with my clients at SAND can be confusing. That said:

I need to revise my figures for SAND’s customer count way downward.
SAND finally has a reasonably clear positioning.
SAND’s product actually seems to have a lot of features.

A few months ago, I wrote:

SAND Technology reported >600 total customers, including >100 direct.

Upon talking with the company, I need to revise that figure downward, from > 600 to 15.

Categories: Archiving and information preservation, Columnar database management, Data mart outsourcing, Data warehousing, Database compression, Market share and customer counts, Parallelization, Predictive modeling and advanced analytics, SAND Technology, Specific users, Workload management

1 Comment

October 10, 2011

Text data management, Part 1: Confusion

This is Part 1 of a three post series. The posts cover:

There’s much confusion about the management of text data, among technology users, vendors, and investors alike. Reasons seems to include:

The terminology around text data is inaccurate.
Data volume estimates for text are misleading.
Multiple different technologies are in the mix, including:
- Enterprise text search.
- Text analytics — text mining, sentiment analysis, etc.
- Document stores — e.g. document-oriented NoSQL, or MarkLogic.
- Log management and parsing — e.g. Splunk.
- Text archiving — e.g., various specialty email archiving products I couldn’t even name.
- Public web search — Google et al.
Text search vendors have disappointed, especially technically.
Text analytics vendors have disappointed, especially financially.
Other analytic technology vendors ignore what the text analytic vendors actually have accomplished, and reinvent inferior wheels rather than OEM the state of the art.

Above all: The use cases for text data vary greatly, just as the use cases for simply-structured databases do.

There are probably fewer people now than there were six years ago who need to be told that text and relational database management are very different things. Other misconceptions, however, appear to be on the rise. Specific points that are commonly overlooked include: Read more

Categories: Analytic technologies, Archiving and information preservation, Google, Log analysis, MarkLogic, NoSQL, Oracle, Splunk, Text

2 Comments

September 22, 2011

Teradata Columnar and Teradata 14 compression

Teradata is pre-announcing Teradata 14, for delivery by the end of this year, where by “Teradata 14” I mean the latest version of the DBMS that drives the classic Teradata product line. Teradata 14’s flagship feature is Teradata Columnar, a hybrid-columnar offering that follows in the footsteps of Greenplum (now part of EMC) and Aster Data (now part of Teradata).

The basic idea of Teradata Columnar is:

Each table can be stored in Teradata in row format, column format, or a mix.
You can do almost anything with a Teradata columnar table that you can do with a row-based one.
If you choose column storage, you also get some new compression choices.

Categories: Archiving and information preservation, Columnar database management, Data warehousing, Database compression, Oracle, Rainstor, Teradata

7 Comments

July 5, 2011

Eight kinds of analytic database (Part 2)

In Part 1 of this two-part series, I outlined four variants on the traditional enterprise data warehouse/data mart dichotomy, and suggested what kinds of DBMS products you might use for each. In Part 2 I’ll cover four more kinds of analytic database — even newer, for the most part, with a use case/product short list match that is even less clear. Read more

Categories: Analytic technologies, Archiving and information preservation, Business intelligence, Buying processes, Cloud computing, Columnar database management, Data mart outsourcing, Data types, Data warehouse appliances, Data warehousing, Database compression, Database diversity, EAI, EII, ETL, ELT, ETLT, Greenplum, Hadoop, Investment research and trading, Log analysis, MapReduce, MOLAP, MySQL, Netezza, NoSQL, Open source, Petabyte-scale data management, Predictive modeling and advanced analytics, Rainstor, SAND Technology, Scientific research, SenSage, Software as a Service (SaaS), Streaming and complex event processing (CEP), Telecommunications, Vertica Systems, Web analytics

6 Comments

June 11, 2010

Rainstor update

I was tired and cranky when I talked with my former clients at Rainstor (formerly Clearpace) yesterday, so our call was shorter than it otherwise might have been. Anyhow, there’s a new version called Rainstor 4, the two main themes of which are:

Compliance-specific features.
Bottleneck Whack-A-Mole.

The point is that Rainstor is focusing its efforts on enterprises that: Read more

Categories: Archiving and information preservation, Rainstor

I’ll be speaking in Washington, DC on May 6

My clients at Aster Data are putting on a sequence of conferences called “Big Data Summit(s)”, and wanted me to keynote one. I agreed to the one in Washington, DC, on May 6, on the condition that I would be allowed to start with the same liberty and privacy themes I started my New England Database Summit keynote with. Since I already knew Aster to be one of the multiple companies in this industry that is responsibly concerned about the liberty and privacy threats we’re all helping cause, I expected them to agree to that condition immediately, and indeed they did.

On a rough-draft basis, my talk concept is:

Implications of New Analytic Technology in four areas:

Liberty & privacy
Data acquisition & retention
Data exploration
Operationalized analytics

I haven’t done any work yet on the talk besides coming up with that snippet, and probably won’t until the week before I give it. Suggestions are welcome.

If anybody actually has a link to a clear discussion of legislative and regulatory data retention requirements, that would be cool. I know they’ve exploded, but I don’t have the details.

Categories: Analytic technologies, Archiving and information preservation, Aster Data, Data warehousing, Presentations, Surveillance and privacy

1 Comment

April 4, 2010

The retention of everything

I’d like to reemphasize a point I’ve been making for a while about data retention: Read more

Categories: Archiving and information preservation, Surveillance and privacy, Web analytics

3 Comments

December 11, 2009

Notes on RainStor, the company formerly known as Clearpace

I nformation preservation* DBMS vendor Clearpace officially changed its name to RainStor this week. RainStor is also relocating its CEO John Bantleman and more generally its headquarters to San Francisco. This all led to a visit with John and his colleague Ramon Chen, highlights of which included: Read more

Categories: Archiving and information preservation, Market share and customer counts, Oracle, Rainstor, SenSage, Telecommunications

1 Comment

November 23, 2009

Boston Big Data Summit keynote outline

Last month, Bob Zurek asked me to give a talk on “Big Data”, where “big” is anything from a few terabytes on up, then moderate a panel on cloud computing. We agreed that I could talk just from notes, without slides. So, since I have them typed up, I’m posting them below.

Categories: Analytic technologies, Archiving and information preservation, Business intelligence, Cloud computing, Clustering, Columnar database management, Data warehouse appliances, Data warehousing, DBMS product categories, Humor, Investment research and trading, Log analysis, MapReduce, Market share and customer counts, NoSQL, OLTP, Open source, Parallelization, Presentations, Pricing, Solid-state memory, Storage, Telecommunications, Theory and architecture, Web analytics

6 Comments

Monash Research blogs

DBMS 2 covers database management, analytics, and related technologies.
Text Technologies covers text mining, search, and social software.
Strategic Messaging analyzes marketing and messaging strategy.
The Monash Report examines technology and public policy issues.
Software Memories recounts the history of the software industry.

User consulting

Building a short list? Refining your strategic plan? We can help.

Vendor advisory

We tell vendors what's happening -- and, more important, what they should do about it.

Monash Research highlights

Learn about white papers, webcasts, and blog highlights, by RSS or email.

Links
- Monash Research
- White Papers
Admin
- Log in

Archiving and information preservation

Issues in regulatory compliance

Clarifying SAND’s customer metrics, positioning and technical story

Text data management, Part 1: Confusion

Teradata Columnar and Teradata 14 compression

Eight kinds of analytic database (Part 2)

Rainstor update

I’ll be speaking in Washington, DC on May 6

The retention of everything

Notes on RainStor, the company formerly known as Clearpace

Boston Big Data Summit keynote outline

Search our blogs and white papers

Monash Research blogs

User consulting

Vendor advisory

Monash Research highlights

Recent posts

Categories

Date archives

Admin