Clarifying the state of MPP in-database SAS
I routinely am briefed way in advance of products’ introductions. For that reason and others, it can be hard for me to keep straight what’s been officially announced, introduced for test, introduced for general availability, vaguely planned for the indefinite future, and so on. Perhaps nothing has confused me more in that regard than the SAS Institute’s multi-year effort to get SAS integrated into various MPP DBMS, specifically Teradata, Netezza Twinfin(i), and Aster Data nCluster.
However, I chatted briefly Thursday with Michelle Wilkie, who is the SAS product manager overseeing all this (and also some other stuff, like SAS running on grids without being integrated into a DBMS). As best I understood, the story is: Read more
Categories: Aster Data, Data warehouse appliances, MapReduce, Netezza, Parallelization, Predictive modeling and advanced analytics, SAS Institute, Specific users, Teradata | 11 Comments |
Vertica update
Last month, Vertica’s CEO Ralph Breslauer quit,* and Vertica made it sound like there would be a new CEO late in April. And indeed, as of April 29, there was. He’s a guy I’ve never heard of before named Chris Lynch, apparently quite the sales machine builder. The most substance I’ve found is a pair of Mass High Tech articles — the latter exceedingly typo-ridden — to the general effect that:
- Vertica plans to build a massive, world-conquering sales force.
- If Vertica dips back into negative cash flow to do that and has to raise more venture capital, so be it.
- “Triple-digit” revenue growth is expected for this year.
Greenplum Chorus and Greenplum 4.0
Greenplum is making two product announcements this morning. Greenplum 4.0 is a revision of the core Greenplum database technology. In addition, Greenplum is announcing Greenplum Chorus, which is the first product release instantiating last year’s EDC (Enterprise Data Cloud) vision statement and marketing campaign.
Greenplum 4.0 highlights and related observations include: Read more
Information found in public-facing social networks
Here are some examples illustrating two recent themes of mine, namely:
- Easily-available information reveals all sorts of things about us.
- Graph-based analysis is on the rise.
Pete Warden scraped all of Facebook’s social graph (at least for the United States), and put up a really interesting-looking visualization of same. Facebook’s lawyer’s came down on him, and he quickly agreed to destroy the data he’d scraped, but also published ideas on how other people could duplicate his work.
Warden has since given an interview in which he outlines some of the things researchers hoped to do with this data: Read more
Categories: Analytic technologies, Facebook, RDF and graphs, Surveillance and privacy | 1 Comment |
Quick news, links, comments, etc.
Some notes based on what I’ve been reading recently: Read more
Some business trends in the data warehouse market
In recent conversations with various analytic DBMS vendors, a fairly consistent picture has emerged.
- Business is strong. Multiple vendors claim to be going gangbusters, with the happy sounds coming out of Vertica and Infobright being echoed by several competitors. Hearsay suggests some other companies in related businesses are doing well too. Depending on who you talk to, the business pickup dates back to Q4, give or take a quarter.
- Oracle Exadata has become a formidable competitor, on the strength of Exadata 2. Exadata 2’s positioning and perception among Oracle users seem to be pretty much in line with what Oracle portrayed to me.
- Teradata is portrayed as a weak competitor. Competitors don’t worry about Teradata nearly as much as they do about Oracle. That said, I suspect a bit of wishful thinking; Teradata is clearly still getting a lot of business the other vendors would dearly love to have.
- HP Neoview is reeling. (Almost) nobody sees Neoview competitively. The Walmart Neoview installation is said to have stayed small at best. JP Morgan Chase is said to have completely thrown Neoview out (and a bunch of HP engineers with it).
- (Almost) nobody mentions competing against DB2 either. This continues to baffle me.
Categories: Analytic technologies, Data warehousing, Exadata, HP and Neoview, IBM and DB2, JPMorgan Chase, Market share and customer counts, Oracle, Teradata | 4 Comments |
Cassandra and the NoSQL scalable OLTP argument
Todd Hoff put up a provocative post on High Scalability called MySQL and Memcached: End of an Era? The post itself focuses on observations like:
- Facebook invented and is adopting Cassandra.
- Twitter is adopting Cassandra.
- Digg is adopting Cassandra.
- LinkedIn invented and is adopting Voldemort.
- Gee, it seems as if the super-scalable website biz has moved beyond MySQL/Memcached.
But in addition, he provides a lot of useful links, which DBMS-oriented folks such as myself might have previously overlooked. Read more
Categories: Cassandra, Data models and architecture, NoSQL, OLTP, Open source, Parallelization, Specific users, Theory and architecture | 16 Comments |
Teradata’s nebulous cloud strategy
As the pun goes, Teradata’s cloud strategy is – well, it’s somewhat nebulous. More precisely, for the foreseeable future, Teradata’s cloud strategy is a collection of rather disjointed parts, including:
- What Teradata calls the Teradata Agile Analytics Cloud, which is a combination of previously existing technology plus one new portlet called the Teradata Elastic Mart(s) Builder. (Teradata’s Elastic Mart(s) Builder Viewpoint portlet is available for download from Teradata’s Developer Exchange.)
- Teradata Data Mover 2.0, coming “Soon”, which will ease copying (ETL without any significant “T”) from one Teradata system to another.
- Teradata Express DBMS crippleware (1 terabyte only, no production use), now available on Amazon EC2 and VMware. (I don’t see where this has much connection to the rest of Teradata’s cloud strategy, except insofar as it serves to fill out a slide.)
- Unannounced (and so far as I can tell largely undesigned) future products.
Teradata openly admits that its direction is heavily influenced by Oliver Ratzesberger at eBay. Like Teradata, Oliver and eBay favor virtual data marts over physical ones. That is, Oliver and eBay believe that the ideal scenario is that every piece of data is only stored once, in an integrated Teradata warehouse. But eBay believes and Teradata increasingly agrees that users need a great deal of control over their use of this data, including the ability to import additional data into private sandboxes, and join it to the warehouse data already there. Read more
Categories: Analytic technologies, Cloud computing, Data integration and middleware, Data warehousing, EAI, EII, ETL, ELT, ETLT, eBay, Teradata, Theory and architecture | 5 Comments |
General introduction to Splunk
I dropped by log analysis software vendor Splunk a few weeks ago for a chat with Marketing VP Steve Sommer (who some you may know from Cognos and/or Informix), Product Management VP Christina Noren, and above all co-founder/CTO Erik Swan. Splunk turns out to be a pretty interesting company, from both business and technical standpoints. For one thing, Splunk seems highly regarded by most people I mention it to.
Splunk’s technical stories include:
- Text search over log files.
- Business intelligence over text search. (That part sounds a lot like Attivio.)
- MapReduce with schema flexibility and smart multi-stage execution plans. (That part sounds a lot like Aster Data.)
More on those in a separate post.
Less technical Splunk highlights include: Read more
Categories: Analytic technologies, Fox and MySpace, Investment research and trading, Log analysis, Splunk, Telecommunications, Text, Web analytics | 1 Comment |
Issues in scientific data management
In the opinion of the leaders of the XLDB and SciDB efforts, key requirements for scientific data management include:
- A data model based on multidimensional arrays, not sets of tuples
- A storage model based on versions and not update in place
- Built-in support for provenance (lineage), workflows, and uncertainty
- Scalability to 100s of petabytes and 1,000s of nodes with high degrees of tolerance to failures
- Support for “external” data objects so that data sets can be queried and manipulated without ever having to be loaded into the database
- Open source in order to foster a community of contributors and to insure that data is never “locked up” — a critical requirement for scientists
However: Read more