Data types
Analysis of data management technology optimized for specific datatypes, such as text, geospatial, object, RDF, or XML. Related subjects include:
- Any subcategory
- Database diversity
Analytic performance — the persistent need for speed
Analytic DBMS and other analytic platform technologies are much faster than they used to be, both in absolute and price/performance terms. So the question naturally arises, “When is the performance enough?” My answer, to a first approximation, is “Never.” Obviously, your budget limits what you can spend on analytics, and anyhow the benefit of incremental expenditure at some point can grow quite small. But if analytic processing capabilities were infinite and free, we’d do a lot more with analytics than anybody would consider today.
I have two lines of argument supporting this view. One is application-oriented. Machine-generated data will keep growing rapidly. So using that data requires ever more processing resources as well. Analytic growth, rah-rah-rah; company valuation, sis-boom-bah. Application areas include but are not at all limited to marketing, law enforcement, investing, logistics, resource extraction, health care, and science.
The other approach is to point out some computational areas where vastly more analytic processing resources could be used than are available today. Consider, if you will, statistical modeling, graph analytics, optimization, and stochastic planning. Read more
Categories: Analytic technologies, RDF and graphs | Leave a Comment |
Teradata, Aster Data, and Teradata/Aster
Teradata is acquiring Aster Data. Naturally, the deal is being presented with a Treaty of Tordesillas kind of positioning — Teradata does X, Aster Data does Y, and everybody looks forward to having X and Y in the same product portfolio. That said, my initial positioning and product strategy thoughts on the Teradata/Aster combination go something like this. Read more
Categories: Analytic technologies, Aster Data, Columnar database management, Data warehouse appliances, Data warehousing, Database compression, RDF and graphs, Specific users, Teradata | 9 Comments |
Notes on document-oriented NoSQL
When people talk about document-oriented NoSQL or some similar term, they usually mean something like:
Database management that uses a JSON model and gives you reasonably robust access to individual field values inside a JSON (JavaScript Object Notation) object.
Or, if they really mean,
The essence of whatever it is that CouchDB and MongoDB have in common.
well, that’s pretty much the same thing as what I said in the first place. 🙂
Of the various questions that might arise, three of the more definitional ones are:
- Why JSON rather than XML?
- What’s with this fluidity between the terms “document” and “object”?
- Are you serious about the lack of joins?
Let me take a crack at each. Read more
Categories: CouchDB, MapReduce, MarkLogic, MongoDB, NoSQL, Object, Structured documents | 16 Comments |
Notes, links, and comments January 20, 2011
I haven’t done a pure notes/links/comments post for a while. Let’s fix that now. (A bunch of saved-up links, however, did find their way into my recent privacy threats overview.)
First and foremost, the fourth annual New England Database Summit (nee “Day”) is next week, specifically Friday, January 28. As per my posts in previous years, I think well of the event, which has a friendly, gathering-of-the-clan flavor. Registration is free, but the organizers would prefer that you register online by the end of this week, if you would be so kind.
The two things potentially wrong with the New England Database Summit are parking and the rush hour drive home afterwards. I would listen with interest to any suggestions about dinner plans.
One thing I hope to figure out at the Summit or before is what the hell is going on on Vertica’s blog or, for that matter, at Vertica. The recent Mike Stonebraker post that spawned a lot of discussion and commentary has disappeared. Meanwhile, Vertica has had three consecutive heads of marketing leave the company since June, and I don’t know who to talk to there any more. Read more
Categories: About this blog, Analytic technologies, Data warehousing, GIS and geospatial, Investment research and trading, MongoDB, OLTP, Open source, PostgreSQL, Vertica Systems | 4 Comments |
Privacy dangers — an overview
This post is the first of a series. The second one delves into the technology behind the most serious electronic privacy threats.
The privacy discussion has gotten more active, and more complicated as well. A year ago, I still struggled to get people to pay attention to privacy concerns at all, at least in the United States, with my first public breakthrough coming at the end of January. But much has changed since then.
On the commercial side, Facebook modified its privacy policies, garnering great press attention and an intense user backlash, leading to a quick partial retreat. The Wall Street Journal then launched a long series of articles — 13 so far — recounting multiple kinds of privacy threats. Other media joined in, from Forbes to CNet. Various forms of US government rule-making to inhibit advertising-related tracking have been proposed as an apparent result.
In the US, the government had a lively year as well. The Transportation Security Administration (TSA) rolled out what have been dubbed “porn scanners,” and backed them up with “enhanced patdowns.” For somebody who is, for example, female, young, a sex abuse survivor, and/or a follower of certain religions, those can be highly unpleasant, if not traumatic. Meanwhile, the Wikileaks/Cablegate events have spawned a government reaction whose scope is only beginning to be seen. A couple of “highlights” so far are some very nasty laptop seizures, and the recent demand for information on over 600,000 Twitter accounts. (Christopher Soghoian provided a detailed, nuanced legal analysis of same.)
At this point, it’s fair to say there are at least six different kinds of legitimate privacy fear. Read more
Categories: Analytic technologies, Facebook, GIS and geospatial, Health care, Surveillance and privacy, Telecommunications, Web analytics | 6 Comments |
The six useful things you can do with analytic technology
I seem to be in the mode of sharing some of my frameworks for thinking about analytic technology. Here’s another one.
Ultimately, there are six useful things you can do with analytic technology:
- You can make an immediate decision.
- You can plan in support of future decisions.
- You can research, investigate, and analyze in support of future decisions.
- You can monitor what’s going on, to see when it necessary to decide, plan, or investigate.
- You can communicate, to help other people and organizations do these same things.
- You can provide support, in technology or data gathering, for one of the other functions.
Technology vendors often cite similar taxonomies, claiming to have all the categories (as they conceive them) nicely represented, in slickly integrated fashion. They exaggerate. Most of these categories are in rapid flux, and the rest should be. Analytic technology still has a long way to go.
In more detail: Read more
Categories: Analytic technologies, Business intelligence, Cognos, Data warehousing, RDF and graphs, Text | 13 Comments |
Evolving definitions and technology categories for 2011
It seems my prediction of a limited blogging schedule in December came emphatically true. I shall re-start with a collection of quick thoughts, clearing the decks for more detailed posts to follow. Read more
Categories: Analytic technologies, Data types, Data warehousing, DBMS product categories, MOLAP, Theory and architecture | 6 Comments |
Document-oriented DBMS without joins
When I talked with MarkLogic’s Ken Chestnut about MarkLogic 4.2, I was surprised to learn that MarkLogic really, truly doesn’t do anything like a join. Unlike some other non-SQL DBMS, MarkLogic has no SQL interface, no ODBC or JDBC. Nothing, nada. (MarkLogic has a Java interface for Xquery, but not for anything like SQL.)
Categories: CouchDB, MarkLogic, NoSQL, Structured documents, Text, Theory and architecture | 8 Comments |
MarkLogic and its document DBMS
This post has been long in the writing for several reasons, the biggest being that I stopped working for almost a month due to family issues. Please forgive its particularly choppy writing style; having waited this long already, I now lack the patience to further clean it up.
MarkLogic:
- Is an ACID-compliant, document-oriented, non-SQL, XML-based scale-out DBMS vendor of non-trivial size and momentum.
- Still has the same technical approach I previously described.
- Recently posted an internally-written white paper with a lot of technical detail.
- Recently had a point release — MarkLogic 4.2 — a lot of which seems to be “Oh, you didn’t have that before?” kinds of stuff.
- Has given me permission to post most of the slides from same, the first few of which give a nice overview of the MarkLogic story.
- Claims 200+ each of customers and employees (that’s from a slide MarkLogic did ask me to remove from the deck).
- Is a client again.
- Not coincidentally, is interested in branching out past the vertical markets of media and government/intelligence, in particular to the financial services market.
- Has finally rationalized its company and product names so that both are now “MarkLogic.” 🙂
- Has finally grasped that if it is proud of its ACID-compliance it probably shouldn’t be trying to market itself as “NoSQL”. 🙂
Vertica-Hadoop integration
DBMS/Hadoop integration is a confusing subject. My post on the Cloudera/Aster Data partnership awaits some clarification in the comment thread. A conversation with Vertica left me unsure about some Hadoop/Vertica Year 2 details as well, although I’m doing better after a follow-up call. On the plus side, we also covered some rather cool Hadoop/Vertica product futures, and those seemed easier to understand. 🙂
I say “Year 2” because Hadoop/Vertica integration has been going on since last year. Indeed, Vertica says that there are now over 25 users of the Hadoop/Vertica combination and hence Vertica’s Hadoop connector. Vertica is now introducing — for immediate GA — a new version of its Hadoop connector. So far as I understood: Read more